Google Researchers Claim First Vulnerability Found Using AI

Researchers from Google Project Zero and Google DeepMind have found their first real-world vulnerability using a large language model (LLM).

In a November 1 blog post, Google Project Zero researchers said the vulnerability is an exploitable stack buffer underflow in SQLite, a widely used open-source database engine.

A team from Project Zero and DeepMind, working under the Big Sleep project, found the flaw in early October before it appeared in an official release. They immediately reported it to the developers, who fixed it the same day. SQLite users were not impacted.

“The vulnerability is quite interesting, along with the fact that the existing testing infrastructure for SQLite (both through OSS-Fuzz and the project's own infrastructure) did not find the issue, so we did some further investigation,” the Big Sleep researchers wrote.

From Naptime Framework to Big Sleep Project

The hybrid team’s AI-powered vulnerability research builds on the work started in 2023 within Project Zero to develop Naptime, a framework enabling an LLM to assist vulnerability researchers.

The framework’s architecture is centered around the interaction between an AI agent and its set of specialized tools designed to mimic the workflow of a human security researcher and a target codebase.

Infosecurity reported on Naptime in June 2024.

Filling the Fuzzing Failures Gap

While the Big Sleep researchers highlighted that the project is still in the early stages and they only have highly experimental results, they also believe it has “tremendous defensive potential.”

Currently, the most common way developers test the software before they go into production is fuzzing.

Also known as fuzz testing, fuzzing involves providing invalid, unexpected or random data as inputs to a computer program or software. The program is then monitored for exceptions such as crashes, failing built-in code assertions, or potential memory leaks.

However, this method seems to have failed to detect the SQLite vulnerability this time. This is due to several complex factors. The bottom line is that the fuzzing setups – the automated testing tools – lacked the specific configurations and code versions needed to trigger the issue.

Another common issue is that unknown vulnerabilities, also known as zero-days, are often variants of known and fixed vulnerabilities.

“As this trend continues, it's clear that fuzzing is not succeeding at catching such variants, and that for attackers, manual variant analysis is a cost-effective approach,” the Big Sleep researchers wrote.

“By providing a starting point – such as the details of a previously fixed vulnerability – we remove a lot of ambiguity from vulnerability research, and start from a concrete, well-founded theory: ‘This was a previous bug; there is probably another similar one somewhere.’”

While the researchers conceded that overall, fuzzing will continue to be as – or more – effective as LLM-assisted manual vulnerability analysis, they hope “AI can narrow this gap.”

“We hope that in the future this effort will lead to a significant advantage to defenders – with the potential not only to find crashing test cases, but also to provide high-quality root-cause analysis, triaging and fixing issues could be much cheaper and more effective in the future.”

At this time, the Big Sleep researchers only use small programs with known vulnerabilities to evaluate the progress of their method.

Previous Records of Successful LLM-Assisted Vulnerability Research

While the researchers claimed that this is the first public example of an AI agent finding a previously unknown exploitable memory-safety issue in widely used real-world software, AIfredo Ortega, a security researcher at Neuroengine, said on X he managed to discover a zero-day in OpenBSD using LLMs back in April 2024 - and he published his result in June.

He also mentioned the work of Google's Open Source Security Team, who found an out-of-bound read in OpenSSL in October.

"I think it's just an honest mistake, quite common in academic circles. Academics are usually not super aware of what happens outside their circle. They cannot know everything that is published in the field. But they just needed to Google it," he told Infosecurity.

Read now: How to Disclose, Report and Patch a Software Vulnerability

Google Researchers Claim First Vulnerability Found Using AI

Kevin Poireault

From Naptime Framework to Big Sleep Project

Filling the Fuzzing Failures Gap

Previous Records of Successful LLM-Assisted Vulnerability Research

You may also like

Claude Desktop Extensions Vulnerable to Web-Based Prompt Injection

Organizations Found to Address Only 21% of GenAI-Related Vulnerabilities

Google's Naptime Framework to Boost Vulnerability Research with AI

Google OSS-Fuzz Harnesses AI to Expose 26 Hidden Security Vulnerabilities

Google Deploys On-Device AI to Thwart Scams on Chrome and Android

What’s Hot on Infosecurity Magazine?

New Hacking Campaign Exploits Microsoft Windows WinRAR Vulnerability

Hundreds of Malicious Crypto Trading Add-Ons Found in Moltbot/OpenClaw

Two Critical Flaws in n8n AI Workflow Automation Platform Allow Complete Takeover

Smartphones Now Involved in Nearly Every Police Investigation

AI Drives Doubling of Phishing Attacks in a Year

SolarWinds Web Help Desk Vulnerability Actively Exploited

NSA Publishes New Zero Trust Implementation Guidelines

Cybersecurity M&A Roundup: CrowdStrike and Palo Alto Networks Lead Investment in AI Security

Data Privacy Day: Why AI’s Rise Makes Protecting Personal Data More Critical Than Ever

Over 80% of Ethical Hackers Now Use AI

New CISA Guidance Targets Insider Threat Risks

Number of Cybersecurity Pros Surges 194% in Four Years

Securing M365 Data and Identity Systems Against Modern Adversaries

Five Non-Negotiable Strategies to Get Identity Security Right in 2026

How to Implement Attack Surface Management in the AI and Cloud Age

Cyber Resilience in the AI Era: New Challenges and Opportunities

Safeguarding Critical Supply Chain Data Through Effective Risk Assessment

Dispelling the Myths of Defense-Grade Cybersecurity

Regulating AI: Where Should the Line Be Drawn?

What Is Vibe Coding? Collins’ Word of the Year Spotlights AI’s Role and Risks in Software

Risk-Based IT Compliance: The Case for Business-Driven Cyber Risk Quantification

Bridging the Divide: Actionable Strategies to Secure Your SaaS Environments

NCSC Set to Retire Web Check and Mail Check Tools

Beyond Bug Bounties: How Private Researchers Are Taking Down Ransomware Operations

Google Researchers Claim First Vulnerability Found Using AI

Written by

From Naptime Framework to Big Sleep Project

Filling the Fuzzing Failures Gap

Previous Records of Successful LLM-Assisted Vulnerability Research

You may also like

What’s Hot on Infosecurity Magazine?