Follow @Openwall on Twitter for new release announcements and other news
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-Id: <2C67A739-491E-4672-94F1-5C78DBC55C97@dwheeler.com>
Date: Mon, 19 Aug 2024 17:02:29 -0400
From: "David A. Wheeler" <dwheeler@...eeler.com>
To: oss-security@...ts.openwall.com
Subject: Re: AI Cyber Challenge (AIxCC) semi-final results from
 DEF CON 32 (2024)


> On Aug 17, 2024, at 4:32 PM, Alfredo Ortega <ortegaalfredo@...il.com> wrote:
> 
> I found a real bug (OpenBSD IPv6 Multicast Forwarding Cache sysctl
> kernel heap overflow) using Mistral-Medium almost 6 months ago:
> https://github.com/ortegaalfredo/vulns-ai/blob/main/openbsd_mfc6_sysctl_overflow.txt
> 
> The simple tool that did it is also released as open-source here:
> 
> https://github.com/ortegaalfredo/autokaker
> 
> About to release the second version, and a vscode plugin, next week.

That's even more evidence that LLMs can find at least some vulnerabilities.

Also - here's a visualization that tries to show how AIxCC competitors
did against the challenge problems:
https://dashboard.aicyberchallenge.com/collectivesolvehealth

You can see that the tools found & fixed many of the seeded vulnerabilities in
nginx, a few in all but one of the others, and they struggled with the
Linux kernel. The Linux kernel is *huge* compared to most projects, so that
isn't too surprising. The final competition is in about a year, so there's hope that
the tools will make improvements in that time as part of the challenge.

To be honest, even finding and fixing *some* problems automatically is a big
win, especially if false reports are rare. Still, the better the tools are at finding and
fixing vulnerabilities, the better off we are.

--- David A. Wheeler

Powered by blists - more mailing lists

Please check out the Open Source Software Security Wiki, which is counterpart to this mailing list.

Confused about mailing lists and their use? Read about mailing lists on Wikipedia and check out these guidelines on proper formatting of your messages.