A logo for soteria technology solutions with a spartan helmet

AI Hackers can find, exploit Zero-Day vulnerabilities

Erin Patten • June 17, 2024

Autonomous AI "hackers" are quickly becoming very sophisticated

In April this year, a team of researchers from the University of Illinois Urbana-Champaign released a paper showing how they had been able to use an LLM (Large Language Model), GPT-4 in particular,  to "autonomously exploit one-day vulnerabilities in real-world systems." 


One-day vulnerabilities are security issues that are known about, but not yet patched.  When a vulnerability is discovered, it is given a number and put on the CVE (Common Vulnerabilities and Exposure) list, which also includes a description and severity level. 


The researchers showed that, when fed a CVE description, "GPT-4 is capable of exploiting 87% of these vulnerabilities compared to 0% for every other model we test (GPT-3.5, open-source LLMs) and open-source vulnerability scanners (ZAP and Metasploit)."  "Fortunately," they added. "our GPT-4 agent requires the CVE description for high performance: without the description, GPT-4 can exploit only 7% of the vulnerabilities. Our findings raise questions around the widespread deployment of highly capable LLM agents."


Teamwork makes the Dream work


Only two months later, the team released another paper.  Building on their previous research, they were able to harness teams of LLMs to successfully exploit real-world zero-day vulnerabilities.


Zero-Day vulnerabilities are security flaws that are not yet known about by the creators of the affected software or hardware (or are very freshly discovered) and not yet patched. Obviously, it's hard to defend a weakness you know nothing about, so threat actors are constantly on the lookout for them.


This time, the researchers used a new technique they call HPTSA (Hierarchical Planning and Task-Specific Agents) to organize a team of LLMs the same way you might organize a project team - with a Planner, a Manager, and a team of specialized Task-Specific Agents. The Planner identifies potential weaknesses and comes up with a plan of attack. The Manager then decides which Agents are best suited for the tasks, deploying and directing their work.


This model was tested on a set of vulnerabilities that the researchers knew about - but the LLMs were not given that information, mimicking a zero-day scenario.  The LLM team was able to successfully exploit over 50% of the zero-day vulnerabilities tested.



A whole new ballgame for Cybersecurity


Now that is is proven that threat actors can potentially use AI to autonomously hack websites, the defenders will need to keep pace. Luckily, the same method can be used to perform penetration testing, to test systems and spot zero-day vulnerabilities - and patch them before they are found by others.  It's easy to imagine that HPTSA will have a huge impact on not only cybersecurity, but in expanding the use of LLMs in unforeseen directions, for good or bad.


As the researchers themselves concluded:

It is unclear whether AI agents will aid cybersecurity offense or defense more and we hope that future work addresses this question.  Beyond the immediate impact of our work, we hope that our work inspires frontier LLM providers to think carefully about their deployments.



Sources:

Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang. LLM Agents can Autonomously
Exploit One-Day Vulnerabilities. arXiv preprint arXiv:2404.08144, 2024.
https://arxiv.org/abs/2404.08144


Richard Fang, Rohan Bindu, Akul Gupta, and Daniel Kang. Teams of LLM agents can Exploit Zero-Day Vulnerabilities.

arXiv preprint arXiv:2406.01637, 2024. https://arxiv.org/abs/2406.01637


This post, like all our posts, is 100% written by a human.

Share this Post

A woman hides her face behind a library book
December 10, 2024
A rare win this month, these scammers are in trouble.
a book with fanned pages and blurry background
By Erin Patten November 20, 2024
Revisiting the Ghost Books Scam - with real-world consequences.
The insightly podcast logo
November 1, 2024
Tariq talks all things cybersecurity with the podcast hosts Alyssa and Jordan.
the silhouette of a woman's face is covered with a projection of green computer code
September 30, 2024
A freely accessible database containing full background data for about a third of all Americans was just uncovered on the internet.
A new two-story home with a soft pink and blue sunset in the background.
August 28, 2024
Real Estate scams and wire fraud costs Americans hundreds of millions of dollars every year. One victim shares her story.
A 19th century engraving of three rough and hungry looking children searching for potatoes.
July 24, 2024
A look at what insights history can offer us about how things like this happen.
A closeup photo of a boxer's shoulders and arms. They are wearing black boxing gloves.
By Erin Patten July 8, 2024
Gigantic password leaks keep rolling in; and they keep getting bigger. How can you keep your accounts safe?
A screenshot from KSN Channel 3, of a newscaster speaking in front of a screen showing computer code
June 24, 2024
Cyberattacks have led to an outage in the software car dealerships across North America use to run their operations - making dealerships rely on pen and paper again, and putting untold amounts of personal data at risk.
A man flips a coin into the air
By Erin Patten June 12, 2024
Between fake job postings and fake applicants, the job market is a rough place to be.
More Posts
Share by: