Dec 12, 2023

Monitoring CrowdSec with Prometheus and Grafana

Monitoring is a key part of any robust cybersecurity tool, allowing end users to maintain optimal performance, detect issues, and gain insight into what is happening across their attack surface.

It’s no different when it comes to CrowdSec. By learning how to monitor your CrowdSec deployment, you can ensure you maintain optimal performance of your Security Engines, so they continue to operate and protect your online infrastructure as expected.

Using Prometheus to monitor CrowdSec

At CrowdSec we rely on the open source monitoring tool, Prometheus, that allows you to keep a close eye on their deployments. Prometheus client libraries are integrated into the Security Engine, providing out-of-the-box monitoring and valuable insight into log processing, scenario triggers, performance, and resource usage.

Having access to these metrics ensures you have the means to ensure the proper configuration of data sources and the ability to verify the successful parsing of logs, giving you peace of mind that your Security Engine is performing as intended and working behind the scenes to keep your systems and networks safe.

Accessing CrowdSec metrics

Now, let’s get started and take a look at how you can grab the metrics that you need.

Running this CSCLI command will return an overview of the CrowdSec metrics provided by the Prometheus client, split into 4 sections — acquisition metrics, parser metrics, bucket metrics, and Local API metrics.

sudo cscli metrics

Acquisition metrics

The first section of the metrics output is the Acquisition Metrics, providingyou with a list of data sources that have been configured in CrowdSec. This output gives you a view of their parsing pipeline, showing the total number of lines read, the number of lines that have been parsed or remain unparsed, and how many of these parsed events have been poured into a detection scenario bucket.

When monitoring your acquisition metrics, keep in mind that, by design, CrowdSec will only attempt to parse log lines that are relevant to your environment, so it is normal to have some unparsed lines for certain services. Our SSHD parser, for example, will only capture failed authentication attempts from your auth logs and successful authentications will not be parsed.

Monitoring these acquisition metrics is a great way to ensure that your data sources are properly configured and that they are producing logs that can be parsed by the Security Engine’s log processors.

Parser metrics

While the acquisition metrics provide insight into the proper configuration of data sources, the Parser Metrics provide you with information about the performance of individual parsers running on your Security Engines.

These metrics give you a view into total Hits, indicating the number of events the parser has attempted to parse, as well as how many of those events were successfully parsed or not.

Monitoring parser metrics provides insights into the performance of individual parsers, helping to evaluate the efficiency of parsers and to identify any issues or patterns related to events that could not be parsed.

Bucket metrics

The CrowdSec Security Engine detects malicious behaviors thanks to several bucket algorithms that form the basis of our detection scenarios. Parsed events are poured into these scenarios, and when the scenario’s bucket is full and overflows, an alert is triggered.

Bucket Metrics gives you a comprehensive overview of the validation process of the different events happening on your networks.

This breakdown includes statistics on the utilization of detection scenarios, revealing the frequency of each bucket within a scenario overflowing (triggering an alert), the total number of buckets created per scenario, the number of events that have been poured into these buckets, and the count of expired buckets, which are those that did not overflow despite events being poured into them.

Local API metrics

By leveraging Prometheus, we are also able to expose a number of Local API metrics for you. The CrowdSec Local API is responsible for the sending and receiving of crowdsourced threat intelligence within the CrowdSec network.

These Local API metrics provide a summary of specific API routes, their access methods, and the number of requests received.

You also get access to a breakdown of what machines are handling the API requests, giving valuable insight into load distribution, as well as a summary of what Remediation Components (bouncers) have been responsible for, acting on intelligence received by the API.

Monitoring these metrics allows you to optimize resource allocation and ensure reliable API responses.

Retaining metrics with a Grafana Dashboard

Running the CSCLI metrics command will only show you metrics from the time your Security Engine was started. If, for whatever reason, your Security Engine is restarted, these metrics will be reset.

Fear not though — believe me when I say, the CrowdSec team knows how important metric retention can be for SOC teams and alike. That’s why we have created a hands-on lab on our free learning academy to show you how you can build a Grafana Dashboard to ensure the retention of these metrics, even if your Security Engine is restarted.

Head over to the CrowdSec Academy and learn how to build your own dashboard with Prometheus and Grafana to monitor the health and performance of your CrowdSec deployment.