GDPR compliance

Everything you need to know about CrowdSec and GDPR.

If one of your questions is not covered here, kindly get in touch with the team via Email, GitHub or Gitter, we will be happy to help. 

Introduction 

First & foremost, privacy is something of the highest importance to all of us at CrowdSec.

The CrowdSec business model doesn’t involve advertisement. Hence we do not try to profile our users in order to detect their behavior or preferences. But an IP address is considered as personal data. The exact point at stake here is the ability to accurately identify a single individual, based on the information you have on him/her. The IP address, correlated with an exact time would allow someone with access to an internet provider (or carrier if on a cellular network) to pinpoint accurately who is behind a connection.

This is a real problem since our reputation system relies on sharing aggressive IPs altogether.

But collecting, processing, and storing personal information is not forbidden as such, as long as very precise and strict rules are followed. 

To simplify those, they include:

  1. Collect & keep the strict minimum
  2. Keep it for the shortest possible time
  3. Properly protect the data you need to have
  4. Allow anyone to correct or remove data about themselves (here IP)

The company editing CrowdSec is based in France, which means we have to comply to both GDPR and CNIL. 

How we deal with GDPR and data protection

We have appointed a law firm that helps us comply to  GDPR in the strictest possible way. GDPR is one of the most accurate and efficient frameworks worldwide around data privacy, presenting similar but often stricter principles than existing frameworks in other countries around the world.

We also hired a DPO, a Data Privacy Officer. This person is in charge of making sure that the company walks the talk: we don’t only claim that we protect your privacy, we actually do it.

The challenges 

CrowdSec faces very specific challenges. First, hackers tend not to sign waivers and acceptance forms when they try to breach a system. Hence, it’s hard to collect their consent.

Second, hackers, most of the time, use IP addresses that don’t belong to them. It means that they impersonate someone else’s digital identity to commit their crimes.

Last but not least, hackers can also try to report false information to the system. Those poisoning attempts could harm a third party, who had no reason to be blocked.

Fortunately, the GDMP does not revolve solely around the notion of consent. There are 5 other legal bases that can legitimize the processing of personal data, including the “legitimate interest” of the data controller. Either here the right to organize to resist hacking attempts. We do take care of those problems and in the most serious possible way. That is why putting together the legal documentation, sending it to the authorities, and collectingtheir answers takes time. Months in fact. We started to work on those specific points around September 2020.

We sought how best to protect rights of the people whose IP addresses would be processed and, to this end, we conducted a specific Privacy Impact Assessment.

The remediations

Collect & keep the strict minimum

First, it is not mandatory to share signals with CrowdSec. This feature can be deactivated in the configuration file. Second, when a scenario is triggered, only three data are sent to us and kept for treatment:

  • The Timestamp (the time where the offensive behavior was detected)
  • The offending IP (not the targeted IP)
  • The scenario that was triggered (credential brute force, port / web scanning, stuffing, etc.)

We do not, ever, export logs or more information than these. This is the strict minimum we need to establish a proper consensus (validation of the signal to avoid poisoning & false positives).

Keep it for the shortest possible time

This point is not easy because detection can be more accurate with a larger set of data, namely longer trace conservation. The industry standard is around one year. We adopted a two-step strategy :

  • In order for the services we offer to work, we adopted a stricter timing of:
    • 3 months, after which we start degrading the data stored, both in terms of IP precision and time precision. For example, a theoretical IPV4 address 22.33.44.55 would become 22.33.44.55/28 after that period (16 possible addresses), 
    • 6 months with more and more severe degradations making it very difficult to point it towards one person only : 22.33.44.55 would become 22.33.44.0/24. i.e. one IP out of 256 (A similar mechanism is applied to IPV6 IPs). 

We also degrade this data in terms of time. If we take the same example, 22.33.44.55 was seen showing aggressive behavior on the 3rd December 2020 at 14:55:12. After a few weeks, the timestamp becomes 14:00 and then just 3/12/2020 after 6 months.

Hence having a vague IP range and a vague timestamp makes it impossible to accurately relate one event to one person. Nevertheless, after one year, a final process will be applied to avoid any risk of re-identification: the temporal degradation of the timestamp will be extended to one week.

For the sole purpose of internal use to optimize the functioning of our artificial intelligence on on one hand, and to offer post-mortem services on these specific cases on the other hand, we may be required to keep this information for a maximum period of one year.

Properly protect the data you need to have

CrowdSec’s team is composed of people coming mostly from Pentesting, SecOps, and high-security hosting. This doesn’t mean that we won’t ever make any mistake, but it guarantees that we have good reflexes about security and are not only aware but also trained and sensitive to the topic. But trust doesn’t exclude control, this is why we have a DPO and are running security tests both on the code and on our infrastructures.

Giving tools to correct or remove data

We do offer a way for anyone to remove its IP from our database. This process is automated and you just need to fill out a form, protected with a Captcha. The latter is made to avoid hackers unbanning their IPs automatically. They can unban themselves, but one IP at a time, with a cool down period. Should this IP land back in the database because of any further aggressive behavior, the time before it can be cleared again will increase, to avoid the same hacker unbanning the same IP, even manually.

We also have a mechanism cleaning automatically  the database of all IPs not seen doing any further aggressions for the last 72 hours.

About the poisoning and false positives, they are dealt with by the Consensus algorithm, more can be found in our FAQ about this.

 

Privacy Preferences
When you visit our website, it may store information through your browser from specific services, usually in form of cookies. Here you can change your privacy preferences. Please note that blocking some types of cookies may impact your experience on our website and the services we offer.