AI Chatbots Highly Vulnerable to Jailbreaks, UK Researchers Find

Four of the most used generative AI chatbots are highly vulnerable to basic jailbreak attempts, researchers from the UK AI Safety Institute (AISI) found.

In a May 2024 update published ahead of the AI Seoul Summit 2024, co-hosted by the UK and South Korea on 21-22 May, the UK AISI shared the results of a series of tests performed on five leading AI chatbots.

The five generative AI models are anonymized in the report. They are referred to as the Red, Purple, Green, Blue and Yellow models.

The UK AISI performed a series of tests to assess cyber risks associated with these models.

These included:

Tests to assess whether they are vulnerable to jailbreaks, actions designed to bypass safety measures and get the model to do things it is not supposed to
Tests to assess whether they could be used to facilitate cyber-attacks
Tests to assess whether they are capable of autonomously taking sequences of actions (operating as “agents”) in ways that might be difficult for humans to control

The AISI researchers also tested the models to estimate whether they could provide expert-level knowledge in chemistry and biology that could be used for positive and harmful purposes.

Bypassing LLM Safeguards in 90%-100% of Cases

The UK AISI tested four of the five large language models (LLMs) against jailbreak attacks.

All proved to be highly vulnerable to basic jailbreak techniques, with the models actioning harmful responses in between 90% and 100% of cases when the researchers performed the same attack patterns five times in a row.

The researchers tested the LLMs using two types of question sets, one based on HarmBench Standard Behaviors, a publicly available benchmark, and the other developed in-house.

To grade compliance, they used an automated grader model based on a previous scientific paper combined with human expert grading.

They also compared the results to LLM outputs when asked sets of benign and harmful questions without using attack patterns.

The researchers concluded that all four models comply with harmful questions across multiple datasets under relatively simple attacks, even if they are less likely to do so in the absence of an attack.

LLMs are Limited Tools for Cyber-Attackers

Other tests shared in the UK AISI May 2024 update showed that four publicly available can solve simple capture the flag (CTF) challenges, of the sort aimed at high school students.

However, they all struggled with more complex problems, such as university-level cybersecurity challenges.

Finally, the UK AISI researchers showed that two models could autonomously solve some short-horizon tasks, such as software engineering problems, but none is currently able to plan and execute sequences of actions for more complex tasks.

These findings suggest that LLMs are likely not significantly helpful tools for cyber-attackers.

AI Chatbots Highly Vulnerable to Jailbreaks, UK Researchers Find

Kevin Poireault

Bypassing LLM Safeguards in 90%-100% of Cases

LLMs are Limited Tools for Cyber-Attackers

You may also like

AI Seoul Summit: 16 AI Companies Sign Frontier AI Safety Commitments

UK and US to Build Common Approach on AI Safety

#Infosec2024: Decoding SentinelOne's AI Threat Hunting Assistant

UK's AI Safety Institute Rebrands Amid Government Strategy Shift

DeepSeek's Flagship AI Model Under Fire for Security Vulnerabilities

What’s hot on Infosecurity Magazine?

Midnight Blizzard Targets European Diplomats with Wine Tasting Phishing Lure

Identity Attacks Now Comprise a Third of Intrusions

NTLM Hash Exploit Targets Poland and Romania Days After Patch

Senators Urge Cyber-Threat Sharing Law Extension Before Deadline

ICO Issues Merseyside-Based Law Firm £60,000 Fine After Cyber-Attack

92% of Mobile Apps Found to Use Insecure Cryptographic Methods

Chaos Reigns as MITRE Set to Cease CVE and CWE Operations

Trump Administration Shakes Up CISA with Staff and Funding Cuts

Digital Certificate Lifespans to Fall to 47 Days by 2029

China-Backed Hackers Exploit BRICKSTORM Backdoor to Spy on European Businesses

NVD Revamps Operations as Vulnerability Reporting Surges

CISA Throws Lifeline to CVE Program with Last-Minute Contract Extension

The Evolving Ransomware Landscape: A 2025 Survival Guide

Proactive Incident Response and Recovery: Navigating Ransomware Attacks

Fireside Chat: How Initial Access Brokers Fuel the Ransomware-as-a-Service Model

Safeguarding Critical Supply Chain Data Through Effective Risk Assessment

Ransomware Negotiations: Mastering an Attacker’s Mindset and Minimizing Leverage

How to Implement Attack Surface Management in the AI and Cloud Age

Gatwick Airport's Cybersecurity Chief on Supply Chain Risks and CrowdStrike Outage

You're Hired! The Truth About Certifications in Cybersecurity Careers

T-Mobile Claims Salt Typhoon Did Not Access Customer Data

Darknet Services Fuel Holiday Scams and E-Commerce Exploits

Top 10 Cyber-Attacks of 2024

Google Deindexes Chinese Propaganda Network

AI Chatbots Highly Vulnerable to Jailbreaks, UK Researchers Find

Written by

Bypassing LLM Safeguards in 90%-100% of Cases

LLMs are Limited Tools for Cyber-Attackers

You may also like

What’s hot on Infosecurity Magazine?