Machine learning (ML) and artificial intelligence (AI) are essential components in modern and effective cybersecurity solutions. However, as the use of ML and AI in cybersecurity is increasingly common and the industry becomes more reliant on these technologies to stem the ever-increasing volume of threat data, it is important to remember that AI is not a panacea and brings its own, new attack surface. Decision-makers evaluating their cybersecurity posture need to be aware of these risks and of the inherent limitations of AI so they can verify that their defenses are robust and resilient to new threats such as adversarial machine learning (AML). AML is an emergent field of study that concerns itself with attacks on ML and AI-based systems, including the deception and evasion of AI detectors. It is crucial for defenders to be aware of these novel attacks and to be able to recognize them.
What do Attacks on ML Models Look Like?
ML solutions use a range of algorithms and statistical methods to analyze datasets and identify patterns. The underlying foundations of these methods inherently allow for novel types of attacks on systems utilizing AI and ML. The MITRE ATLAS framework enumerates and classifies techniques for attacks on ML systems.
One such technique is data poisoning, which aims to manipulate the data used to train AI models. AI models learn how to react to input from large datasets called “ground truth.” The ground truth defines what the appropriate output of the model should look like – it is what the model models itself after. Attacks can attempt to add erroneous information into ground truth, which is incorporated into the training process. Manipulating the training process in this way then leads to the model reacting incorrectly towards some input data. For example, attackers can trick the model into classifying a malware file as a legitimate application.
Data poisoning attacks can be carried out in various ways, including gaining access to ground truth datasets as part of a traditional security breach. However, manipulating public datasets used to train data-hungry AI algorithms is a more common and impactful technique. In cases where AI learns directly from user input, adversaries can leverage that access to corrupt the AI system. This happened with the Twitter bot Tay, whose AI was supposed to learn from conversations with other Twitter users. Users deliberately manipulated the bot, causing it to post hate speech on the social media platform within less than a day.
ML systems are also susceptible to evasion attacks where attackers try to fool the model’s prediction system. Attackers can use so-called adversarial examples and input data with small perturbations intended to confuse the ML system to achieve an erroneous classification. A typical example of this type of attack is changing a few pixels in an image before uploading it so that the image recognition system fails to classify it or classifies it differently. Minor alterations of pixels are often not visible to humans or not directly recognizable as an attack but nonetheless result in radically different model outputs.
In a cybersecurity-specific case of an evasion attack, security researchers manually modified a malicious file so that an antivirus vendor’s AI-based detection would rate it as legitimate. The researchers did this by extracting strings from legitimate software and then added them to the malware. The vendor’s AI model gave more weight to these legitimate strings than to the malicious routines in the file and then classified the file incorrectly as benign.
Knowledge is Power When it Comes to AI Models
Prior knowledge about the ML target system also impacts the chances of success of an attack. The more attackers know about the AI system and its architecture, the easier it is to launch an attack and select the appropriate attack method. In the case of the aforementioned evasion attack on the antivirus vendor, the attackers had access to the model and the software. This is called a white box attack. The attackers could analyze the algorithms and thus find the right strings to deceive the system successfully.
At the other end of the spectrum are black box attacks, where attackers have little or no knowledge of the AI model. If the model outputs a statistical certainty with the classification, e.g., the probability that a file is malware, the attackers can use gradient-based methods. They can iteratively modify a malware file, check the malware probability the model computes, and adjust the next round of modifications depending on the probability going up or down. This way, they approach their objective step-by-step like in a game of “hot and cold” until the file receives a very low probability of being malware.
How to Protect Machine Learning Models
Defenders can use methods that can prevent, complicate, or detect attacks to protect ML systems. For example, when adding benign strings to a malware file, a monotonic classification model will still correctly detect the file – it does not matter to the model how many benign traits a file has if there are also malware traits present.
Gradient-based attacks can be complicated by models that only output so-called hard labels, i.e., no probabilities and only categories (e.g., “malware” or “benign application”) as the final result. However, an attacker could train a proxy model based on the hard label output of the victim model as ground truth, assuming the attacker can collect a sufficient amount of output records. This proxy model can then be used to approximate the gradient of the victim model. Therefore, the goal for defenders is not to thwart the entirety of possible attacks but to increase the cost of the adversary for finding viable routes to attack ML systems and to ensure that they can detect when their ML systems are under attack.
Defenders can access a wider range of data sources like using extended detection and response (XDR) to protect themselves from AML attacks. However, it is important for defenders not to rely solely on AI and blindly trust its results. Considering that AI brings its own attack surface, defenders need to avoid AI monocultures and incorporate other powerful approaches, such as indicators of attack. Ultimately, security vendors need one thing above all else: human expertise. After all, the ability to recognize adversarial ML attacks and adapt one’s AI models accordingly is crucial for building a robust defense.
CrowdStrike will be exhibiting at Infosecurity Europe next week. Register for the event here.