Big Data and Cybersecurity - Making it Work in Practice

Written by

In today’s complex IT environment, identifying security events fast is critical to minimizing the impact. However, in order to detect and remediate attacks in this environment, security teams need the proper tools to process and correlate massive amounts of real-time and historical security event data. By applying advanced analytics techniques to these huge amounts of data, infosec teams can better detect and defend against sophisticated attacks.
 
Implementing this in the real world is easier said than done. The sheer variety of attack vectors, along with the volume of data to sift through, means that getting insight for security is hard. Preventing one type of attack is simply not enough, either; according to research by Verizon for the 2017 Data Breach Investigations Report, attackers use a mix of techniques during their campaigns: around 62% of attacks were linked to hacking, while 51% made use of malware and 43% included social attacks. Alongside this, 14% of attacks were the result of employee mistakes leading to gaps in security that could be exploited.
 
To make matters more complicated, cyber-criminals have begun to make use of artificial intelligence within their own systems to rapidly scale attacks, personalize phishing emails, identify system vulnerabilities, and mutate malware and ransomware in real time. Out maneuvering these increasingly complex attacks requires cybersecurity teams to monitor their network for a broad range of threats that may or may not resemble traditional threat patterns.
 
Using data to stay ahead of potential threats can help. Companies have huge volumes of data at their disposal, flowing in from a diverse set of sources including intrusion detection systems, network infrastructure and server logs, application logs and more. This data can quickly add up for large enterprises, totaling petabytes in size.
 
When a suspicious event is identified, threat response teams need to run queries in real-time against those large historical and streaming datasets to verify the extent and validity of a potential breach. This forensic analysis should confirm the threat, flag it for further investigation, or discount the anomaly.
 
In practice, this means having enough processing power to analyze billions of records within seconds. Similarly, security analysts require data that is highly accurate and consistent, a common challenge when scaling big data in real-time.

What’s stopping you from using those huge volumes of data?
Normally, Security Information and Event Management (SIEM) platforms would be used to manage all this data. However, many of these solutions were not built with big data in mind. Threat detection devices can produce petabytes of log data that need to be contextualized and analyzed in real time, but processing these petabytes of data takes significant computing power.
 
The economics here are interesting. Many SIEM and security analytics tools were built for on-premise environments. Scaling on-premises infrastructure to meet current data demands is a costly proposition.

Additionally, most SIEM tools charge customers per GB of data ingested. This makes scaling up and dealing with large volumes of data incredibly cost-prohibitive.
 
Similarly, most security teams only have access to a few weeks of historical data as storing all this data at scale can be expensive. Of course, as soon as an event occurs, security analysts need to conduct deep historic analyses to fully investigate the validity and breadth of an attack. This cost of data storage limits the effectiveness of security teams to identify attacks over long periods of time, and makes it hard to conduct forensic reviews in real time.
 
Another common challenge is the high volume of false positives produced by SIEM tools. With so much data captured in OS logs, cloud infrastructure logs, intrusion detection systems and other monitoring devices, it’s easy to identify hundreds of suspicious events each day. Some of these events may signify a compromised network, however many do not. Further investigation is necessary to determine if each threat is legitimate.
 
Relying on individuals to review hundreds of alerts including a large number of false positives results in alert fatigue. Eventually, overwhelmed security teams will disregard or overlook those events that are actually legitimate threats.
 
Looking beyond the big data hurdle
The big issues around Big Data are not insurmountable. Exploring cloud-based solutions to overcome the challenging economics of data processing and storage can help make Big Data more palatable. Similarly, augmenting existing SIEM solutions with Big Data platforms capable of exploring and modeling diverse sets of data at scale can help extend visibility and improve the overall security posture of the business.
 
Another area of promise is the use of AI. Bringing data science into the fold with machine learning and AI to prioritize security alerts and automate response can significantly reduce the fatigue placed on security teams. Machine learning models can also be trained to identify anomalous behavior patterns that may not be captured by predefined security rules. Security teams looking to make this leap need to invest in data science skillsets and applications.
 
Additionally, making it easier for analysts, security teams and IT professionals to collaborate in a unified data environment can help improve processes and speed-to-insight. Ultimately, bringing a Big Data approach and choosing the right data platform to help solve these challenges opens the door to better prevention and a more secure enterprise.

What’s hot on Infosecurity Magazine?