CrowdSec is licensed under MIT open source license, you can find a copy of the text here:
“Copyright 2020, CrowdSec SAS (http://crowdsec.net),
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the “Software”), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED “AS IS”, WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.”
That’s all there is to it. Like in the case of Debian, you can do anything you want with it, for free, period.
You just need to embed this license when you redistribute the product.
What CrowdSec is about
CrowdSec is a security engine leveraging a behavior system to qualify whether someone is trying to hack you, based on your logs. If your CrowdSec agent detects such aggression, the offending IP is then dealt with and sent for curation. If this signal passes curation, the IP is then redistributed to all users sharing a similar technological profile to “immunize” them against this IP. The global concept is to leverage the crowd power to create some form of Internet Neighborhood watch. As for the IP that aggressed your machine, you can choose to remedy the threat in any manner you feel appropriate, using a Bouncer. It could be dropped in your firewall, receive a 403 from your Nginx, get a Captcha, trigger a 2FA or a script, your call. If any IP received by the behavior engine is found knocking at your virtual doors, it gets the same treatment as the one detected by the behavior engine. This dual approach leveraging both behavior & reputation makes a huge difference.
Sharing offending IPs (as well as receiving them) is optional and can be deactivated, just know that we never export your logs.
When an IP is reported, we just get the Timestamp, the aggressive IP, and the triggered scenario.
Here are the keywords we use about CrowdSec:
- Watcher: A user that shares the IPs he blocks using the behavior engine
- Agent: The piece of software you can download from Github or directly in packages and execute on your Internet exposed machines
- Alert: A clear behavior extracted from a log source by a scenario, upon which a bouncer can be activated
- Parser: They normalize and enrich (geography, 3rd party whitelist, etc.) logs, signals & data
- Data source: Allow log, signals, or data acquisition (logfile, rsyslogd, cloud trail, MQTT, etc.)
- Scenario: a Yaml file describing a behavior to identify Aggressions
- Bouncer: a component enforcing the decision which is setup in the Local or Online interface, it can be drop, captcha, mfa, privilege drop, rate or speed limiting, etc. It can work on any level IP/session/business logic. You could, for example, use a TPC wrapper in a IoT IDE to handle incoming connexion, or send a Captcha from your Magento if an IP is marked as dangerous.
- Collection: A group of scenarii, parsers and datasource focused on a precise vertical (eCommerce), technical context (Magento) or generic template (LAMP)
- Cscli: The command line tool to interact with the daemon, dashboard and database
- Consensus: The group of algorithms and data source contributing to the evaluation of an Alert. This is a server-side treatment (explained further in this FAQ), made to avoid false positives & poisoning.
Server-side treatments and online dependency
Server-side treatments involve the following:
- Collecting information (IP / Timestamp / Scenario) sent by the network members accepting to share them
- Distributing curated IP block list (tailor-made for each, according to their choices in the back office (coming soon))
The reputation system (feeding your local daemon with IPs to block), can be deactivated and/or replaced by another source of reputation in the configuration, making the software 100% able to function in a standalone manner if you want absolutely no dependency on any online service. With the local API (LAPI, as of v1.0) agents can be deployed & configured 100% offline if you want to.
Data flow & data gathered
All logs are treated locally, on your servers, within your premises. None of them, at any time of the treatment, are exported to our servers.
1/ When CrowdSec connects to the online API, it sends the scenario list to which the user has subscribed, in order to get a tailor-made list of IP to block to protect himself.
2/ If an aggressive IP is detected by the local behavior engine, those (and only those) data are sent back to our servers: [Timestamp] [IP] [Scenario]
[Timestamp]: The time and date at which the event occurred. This data allows us to correlate a malicious IP activity across all of CrowdSec’s users’ community. We also are using it to expire a ban decision after a certain timing.
[IP]: The aggressive IP involved in an attack or aggressive comportment.
[Scenario]: The scenario triggered by the aggressive IP. We need this information to identify the given IP’s behavior and to evaluate if the scenario is not triggering false positives.
This is the absolute minimum we need to create a robust, global, IP reputation database.
The software supports IPV6. Its API & bouncers as well.
The IP reputation system also applies to IPV6 addresses space.
Data treatment, storage & security
Data is treated by our online servers. CrowdSec’s team is made of former Pentesters, DevOps, SecOps, SecDevOps, some having a decade of experience in secure hosting. This doesn’t mean we aren’t error-prone, but at the very least, we have decent field knowledge and standards. Our servers are secured and maintained. Obviously, being breached would deeply damage our reputation and trust relation with our community, hence this point is not taken lightly.
Even though we take all measures we think are adapted to protect our servers, would those collection servers be compromised, nothing vital transits through them. The data they gather is not sensible, confidential, or private material. We would potentially miss some signals, but that’s pretty much it. The consensus servers, the one casting whether an IP is dangerous or not, is not publicly exposed and is also severely secured to avoid any security breach.
The storage of those data isn’t exposed either and only accessed through an Internal API. If the data were to be wiped out by accident or intentionally, the network would anyway quickly regenerate a consensus within a few hours. Anyway, those servers distributing the consensus (ie IP blacklists) aren’t either containing any sensitive information.
The “Watcher tier” consists of people using the software, sending us their signals and, in return, benefiting, for free, from the global, curated, IP reputation database. We send them, at very regular intervals, a list of IP addresses considered dangerous for them, that they can safely ban or regulate in any way they see fit. The IP blocklists and global database belong to us, but a full, unlimited right to use is granted to the user, as long as they share the IP they block through CrowdSec instance(s).
The data sent by each instance of CrowdSec is only made of a timestamp, the scenario triggered, and the aggressive IP. Those data, collected worldwide on our servers, are then curated to avoid false positives and poisoning and then redistributed to each user sharing signals, based on its technological footprint (self-declared). Hence if you are running a Magento for instance, you only get IP nefarious to Magento. This avoids overloading the machine with a very large ban list but also lowers the chances of a false positive.
Those curated data are CrowdSec property and a usage right is given to users receiving an IP list. It can be even used outside of the context of CrowdSec. If you use CrowdSec and share the IP blocked with us, nothing prevents you from using the ban list you receive on your SIEM or other security tools.
CrowdSec team is doing an expensive job by editing this codebase, running servers, and curating all those signals. Hence, this team (mainly composed of humans plus a few cats & alpacas), need to be fed. Open source projects that are not financed often tend to decay since their authors have to arbitrate between their passion and their working hours. Here the team members’ daily work is to passionately produce this open-source software. It’s not a side job, they are paid for it and we think this ensures long-term stability, efficiency, and quality.
To pay the wages, company costs, and R&D effort, we monetize CrowdSec data in the least possible aggressive way. To put it shortly, anyone using CrowdSec can do it without sharing data and it’s fine by us. They are not getting the community signals though. We call them the “free tier”. “Watcher Tier” get the IP reputation service entirely for free.
Two monetization plans are being studied right now. These paying features are basically added value, which costs us money to create and operate. Typically, the “Premium tier” offers better support, self-monitoring (of your own IP to see if any get compromised), and cold log analysis which allows you to use IP reputation DB to make forensic. This last activity implies that we keep a history of how an IP behaved in the past and correlate this information with your log timestamps, hence taking space on our storage.
The “Enterprise tier” offers the same benefits as the premium tier plus fleet management features. Typically this plan is made for companies handling hundreds of exposed endpoints, administration IP, VPNs, Websites, Apps, etc. They can centrally define several filtering profiles and enforce them on a large scale, from a single back-office. This plan also includes a private consensus, where CrowdSec Agents belonging to the same machine group can ban IPs targeting only one precise customer, hence not visible in the global database, but that could be identified locally.
The “API tier” will simply query the API to get the reputation of a given IP they are about to peer with. They don’t share any signals with us, hence they pay to get access to this data. We want to create a digital herd immunity, so if you don’t participate in the sharing, you support the effort by paying for the service. Think of a vaccination campaign, if everyone is vaccinated, you are protected even if you are not, but the others are carrying the cost of your protection. Here we level it because some businesses, associations, political parties, etc. just can’t share their signals.
<TL;DR> What is free today, product, and the IP reputation service (for those sharing the IP they block) is and will stay free.
We constantly enforce three principles:
- Collect the minimum possible data, only one in this case: the aggressive IP. (Time & scenario are not private data per se)
- We only keep them only for the necessary period of time
- Anyone will be able to remove its IP from our database (but it can be reintroduced automatically if the IP was not cleaned, and the cooldown between request is getting exponentially longer)
Lawyers are also contracted to review our policies and validate that our processes are GDPR compliant. We will, very soon, release more details and legal work around those points, as well as update our website’s footer & cookies to be fully compliant with international regulations. More information about our compliance to GDPR can be found in this dedicated section.
Poisoning & False positives (aka Consensus)
Every network member (watchers sharing their signals) gets a trust rank (TR). By consistently sending back valuable and exact information, the TR gets better over time. A daemon reporting for months, with 100% accuracy, valuable information will eventually reach the maximum TR. Feeding the system with wrong information would result in a severe and immediate loss of TR. This mechanism is made to avoid poisoning.
All TR can partake in the consensus, but only the highest TR rank can publish to the database without needing validation from our own honeypot network. It nevertheless has to pass the test of the Canary list, meaning the IP reported shouldn’t be one of the canary. Canaries are in fact whitelisted IP, known to be trustworthy, like the Google bot, Microsoft updates, etc. If a scenario is too sensitive or twitchy, it might shoot a canary. This mechanism is made to avoid false positives.
A ML algorithm will (soon) be trained on our honeypot network logs to further rule out false positives and also highlight low noise attacks, like IP working in a coordinated fashion, but where some of them aren’t directly violating rules. (like doing a basic port check before another one compromises a machine).
All those mechanisms (and more to come) contribute to what we call the Consensus chamber (Consensus in short), where the decision is taken to either ban the IP responsible for an alert or not.
We highly recommend users to always take the “softest” remedy, here is why.
Any IP can be used to give access to a large number of users. Think of a large corporate network allowing 35 000 users to surf the net through 4 proxies for example. If you ban one IP, you could block some of the 34 999 legitimate users to just stop one hacker and that would be overkill. Also, some IPs are used by CGNAT (Carrier Grid Nat) or in variable IP pools. Users behind those IPs are not always the same and blocking the IP is not a real option neither an efficient remedy.
To avoid these problems, CrowdSec uses several mechanisms. One of them is to only keep an IP for 72h in our database. If this IP hasn’t shown any sign of further aggressivity with this timeframe, we consider it has been cleaned or that it was a variable IP, and it’s removed from the blocklist.
You can choose, as a user, to use smarter ways than just purely drop the connexion with your firewall.
If you protect your home network from scans, it’s not going to harm anyone if you ban this IP in your firewall, but if you run an e-commerce website, for example, you may want to be more careful. Depending on which technology you use, you could send a Captcha, reduce user rights, send a mobile factor authentication, slow down the connexion, etc. Bouncers come in all shapes to cover a lot of various use case, choose them wisely, and please, use the least aggressive remedy that will keep your assets safe.
Anyone can unban him/herself. If the IP behind attacks was cleared from the security breach that probably led to nefarious actions, we have no reason to keep it in our database forever. It could also happen that the Consensus unrightfully evaluated an IP as dangerous. But we cannot ignore the fact that hackers themselves may want to unban themselves. That is why the first removal will be made within 24h. The second query to unban the same IP though will take more time. And the third one even more time. This is made to prevent hackers to unban themselves too easily. The Captcha required will also chill out attempts to clear a large number of IPs in an automated way.
Any IP that wasn’t spotted for at least 72 hours will also be automatically be cleared without the need for admins to unban themselves.
We have a deep admiration for fail2ban’s work and are in contact with some of its contributors.
Cyril started it as a Python exercise for himself, then many others made it the default security component we all know.
(We do not name them here, to avoid forgetting anyone, but also to preserve the ones not willing to get exposure, but you all know who you are)
Nevertheless, Fail2ban was created 16 years ago, based on Python. CrowdSec capitalizes on its philosophy, but the company behind it provides more work power and a long-term sustainable model, allowing high-profile developers and security experts to dedicated themselves 100% to this software.
Also, years apart, we make different choices and adopt newer models. Like the decoupled approach, a faster language (Golang), an inference engine, Yaml & Grok, IPV6, API first approach, multi-layer awareness, a hub to find your configurations, IP reputation, multi-OS compatibility, etc.
Whatever future awaits CrowdSec, the team extends its greetings to this formidable piece of software & team, that has written part of security history on Unix hosts.
Open-sourcing the consensus engine
Some people have expressed questions about “why” we aren’t open-sourcing the “central intelligence” aka “global consensus” part. While we are focused on making the CrowdSec suite a suitable software for the open-source world, it means there is constant arbitration between maximum efficiency and compatibility with the larger population. And, rather often, we make our decisions based on the fact that we want the larger part of the users to be able to use CrowdSec on a daily basis without inducing unnecessary complexity. It reflects a lot of technical choices we are making, from the libraries we are choosing, to the attention we’re bringing to observability or even parsers/scenarios syntax.
It should as well be noted, that there is *no* dependence between CrowdSec and the central API mechanism: it is not required by CrowdSec to work, and data push & pull can be simply disabled. As true as it is when it comes to the open-source part that we are distributing to everyone, it is also true that we don’t want to apply the same restrictions when it comes to the central decision making system and processes. This part is operated by us and us only, and we don’t and won’t compromise efficiency for simplicity. That is in part why we chose public cloud platform to build this part (AWS mostly as we speak), and we’re taking a lot of tradeoffs for the sake of getting faster where we’re aiming to be: a sensational reputation engine that will be able to compute and redistribute sighting to all the participants of the network. Maybe one day we’ll discuss about redistributing this part, but this day is not in sight yet: we’re making a lot of architectural changes on a nearly weekly/monthly basis, and attempting to open-source it will only increase the development cost while reducing our velocity, while most likely simply be a nightmare for anyone trying to operate it!
Privileged access to the IP Reputation database
Some experts can get free, read-only, access to our database, under conditions.
They need to contact us to get back-office access and be able to prove they belong to one of the following categories:
- Law enforcement forces
- Professional security researcher
- Students in data science or security
Before accessing the data, a waiver and an NDA will have to be signed by the participant. Professional categories that aren’t listed can still contact us to ask for access, authorizations will be granted if the purpose of the study is both ethical & legitimate.