AI Chatbots Highly Vulnerable to Jailbreaks, UK Researchers Find

Four of the most used generative AI chatbots are highly vulnerable to basic jailbreak attempts, researchers from the UK AI Safety Institute (AISI) found.

In a May 2024 update published ahead of the AI Seoul Summit 2024, co-hosted by the UK and South Korea on 21-22 May, the UK AISI shared the results of a series of tests performed on five leading AI chatbots.

The five generative AI models are anonymized in the report. They are referred to as the Red, Purple, Green, Blue and Yellow models.

The UK AISI performed a series of tests to assess cyber risks associated with these models.

These included:

Tests to assess whether they are vulnerable to jailbreaks, actions designed to bypass safety measures and get the model to do things it is not supposed to
Tests to assess whether they could be used to facilitate cyber-attacks
Tests to assess whether they are capable of autonomously taking sequences of actions (operating as “agents”) in ways that might be difficult for humans to control

The AISI researchers also tested the models to estimate whether they could provide expert-level knowledge in chemistry and biology that could be used for positive and harmful purposes.

Bypassing LLM Safeguards in 90%-100% of Cases

The UK AISI tested four of the five large language models (LLMs) against jailbreak attacks.

All proved to be highly vulnerable to basic jailbreak techniques, with the models actioning harmful responses in between 90% and 100% of cases when the researchers performed the same attack patterns five times in a row.

The researchers tested the LLMs using two types of question sets, one based on HarmBench Standard Behaviors, a publicly available benchmark, and the other developed in-house.

To grade compliance, they used an automated grader model based on a previous scientific paper combined with human expert grading.

They also compared the results to LLM outputs when asked sets of benign and harmful questions without using attack patterns.

The researchers concluded that all four models comply with harmful questions across multiple datasets under relatively simple attacks, even if they are less likely to do so in the absence of an attack.

LLMs are Limited Tools for Cyber-Attackers

Other tests shared in the UK AISI May 2024 update showed that four publicly available can solve simple capture the flag (CTF) challenges, of the sort aimed at high school students.

However, they all struggled with more complex problems, such as university-level cybersecurity challenges.

Finally, the UK AISI researchers showed that two models could autonomously solve some short-horizon tasks, such as software engineering problems, but none is currently able to plan and execute sequences of actions for more complex tasks.

These findings suggest that LLMs are likely not significantly helpful tools for cyber-attackers.

AI Chatbots Highly Vulnerable to Jailbreaks, UK Researchers Find

Kevin Poireault

Bypassing LLM Safeguards in 90%-100% of Cases

LLMs are Limited Tools for Cyber-Attackers

You may also like

AI Seoul Summit: 16 AI Companies Sign Frontier AI Safety Commitments

UK and US to Build Common Approach on AI Safety

#Infosec2024: Decoding SentinelOne's AI Threat Hunting Assistant

UK's AI Safety Institute Rebrands Amid Government Strategy Shift

DeepSeek's Flagship AI Model Under Fire for Security Vulnerabilities

What’s Hot on Infosecurity Magazine?

New Hacking Campaign Exploits Microsoft Windows WinRAR Vulnerability

Hundreds of Malicious Crypto Trading Add-Ons Found in Moltbot/OpenClaw

Two Critical Flaws in n8n AI Workflow Automation Platform Allow Complete Takeover

Smartphones Now Involved in Nearly Every Police Investigation

AI Drives Doubling of Phishing Attacks in a Year

SolarWinds Web Help Desk Vulnerability Actively Exploited

NSA Publishes New Zero Trust Implementation Guidelines

Cybersecurity M&A Roundup: CrowdStrike and Palo Alto Networks Lead Investment in AI Security

Data Privacy Day: Why AI’s Rise Makes Protecting Personal Data More Critical Than Ever

Over 80% of Ethical Hackers Now Use AI

New CISA Guidance Targets Insider Threat Risks

Number of Cybersecurity Pros Surges 194% in Four Years

Securing M365 Data and Identity Systems Against Modern Adversaries

Five Non-Negotiable Strategies to Get Identity Security Right in 2026

How to Implement Attack Surface Management in the AI and Cloud Age

Cyber Resilience in the AI Era: New Challenges and Opportunities

Safeguarding Critical Supply Chain Data Through Effective Risk Assessment

Dispelling the Myths of Defense-Grade Cybersecurity

Regulating AI: Where Should the Line Be Drawn?

What Is Vibe Coding? Collins’ Word of the Year Spotlights AI’s Role and Risks in Software

Risk-Based IT Compliance: The Case for Business-Driven Cyber Risk Quantification

Bridging the Divide: Actionable Strategies to Secure Your SaaS Environments

NCSC Set to Retire Web Check and Mail Check Tools

Beyond Bug Bounties: How Private Researchers Are Taking Down Ransomware Operations

AI Chatbots Highly Vulnerable to Jailbreaks, UK Researchers Find

Written by

Bypassing LLM Safeguards in 90%-100% of Cases

LLMs are Limited Tools for Cyber-Attackers

You may also like

What’s Hot on Infosecurity Magazine?