Research Data Scientist (6-month internship)
Passionate about tech? Interested in cybersecurity and graph modeling? Join CrowdSec, an open-source cybersecurity solution to build a safer internet for everyone.
What we do
Our mission is to deter opportunistic and organized cybercrime through an Internet-scale, real-time security network. This network is powered by thousands of users, sharing the aggression they blocked with our open-source software. By coupling this local behavior analysis and sharing its findings within our community, we create a giant hacker radar, a large-scale reputation system, a Crowd-Sourced Cyber Threat Intelligence of unprecedented magnitude that will counter the vast majority of technical hacks.
We collect unique data: malicious public IP addresses, crowdsourced from the software all around the world. Each IP address comes with 3 metadata:
- The time of the alert
- The CrowdSec software ID which reported the alert
- The type of intrusion triggering the alert (ssh brute-force, spamming, crawling …)
Your role is to analyze and develop new algorithms to find the interactions between the attackers (malicious IPs): How to detect a cohort of attackers? How to classify unseen IP addresses?
One way to tackle this problem is to define a bipartite graph where each defender is linked to the attackers they reported. One wants first to establish the probability of the IP addresses being truly malicious, given the reputation of the defenders.
This ensures we only redistribute IP addresses that are truly malicious and not false positive, as we want to avoid data poisoning at all costs). In a second step, one wants to detect cohorts of attackers targeting the same machines (see, for example, the Louvain Algorithm). The challenge is then to implement this in a dynamic graph framework.
Another problem is the classification of new IP addresses before they are labeled as attackers. For this purpose, external data can be collected for all IP addresses (ports open, services installed, IP range… ), which are used as node attributes and help to characterize malicious IP addresses. One can hence learn the graph structure in an inductive manner, such as in the GRAPH Sage algorithm, to perform inference for unseen nodes.
In this role, you will work closely with your tutor and interact with the core tech team (10 people). As a research intern, you will have to:
- Read research papers and review state-of-the-art methods in the literature
- Formulate problems considering available data and the objectives
- Design, implement, and test machine learning algorithms
- Continuously provide ideas to improve the software, specifically the consensus section, and take part in the team decisions.
This is an ideal opportunity if you are looking for a six-month end-of-studies internship starting between February and May 2022 and you want to work in a tech-savvy environment. The qualifications are:
- You have a solid scientific background in Machine Learning (including deep learning), Statistics, and Graph algorithms.
- You have programming skills in Python, Machine Learning libraries, data visualization packages, and data manipulation.
- You are a fast learner and have a genuine interest in cybersecurity
- You can design and deliver working prototypes in a fast-paced environment and then work with the core team to put them in production.
- You are autonomous, do not hesitate to share new ideas with the team and challenge existing solutions.
Why join CrowdSec?
- You want to take part in an adventure with a young and fully-funded startup.
- Work with a team of blackbelt pros in their fields and have fantastic interactions with an exciting community.
- You enjoy contributing to an open-source project
- You love to tackle challenges and develop algorithms in non-standard frameworks with unique data
- We are a full-remote company, we meet every 4 months to strengthen the team spirit during a 1-week seminar.