How to Build AI that Outperforms Human Auditors?
I am not an expert on the low-level principles of AI or LLMs. However, my observations of the latest models, combined with my experience finding bugs, have led me to one conclusion: AI is already intelligent enough to understand and discover the very vulnerabilities I hunt for.
However, why does a simple prompt like "spot the bug in the code below" fail so miserably? I believe the issue is the absence of structured process, prompt engineering, and human intervention. Here are some directions we could explore:
1. Prove Correctness. Finding a bug is an outcome. The real work is to prove the code's correctness. A bug is simply what you find when a proof fails. An audit report generated by AI shouldn't be a list of potential issues. It must be a complete proof, just like a math paper, framed in natural language, that logically and structurally proves the code's correctness.
2. Adopt an Attacker's Mindset. The AI must think like an adversary.
- It should construct different kinds of real-world scenarios and relentlessly ask "what if?" to imagine unexpected events.
- When it reads code, it should form a hypothesis and challenge it: "Is this assumption always true?"
- When it spots a specific detail, it must probe deeper: "What if the input here is malicious? How could this be exploited as a step for an attack?"
- When it is trying to understand the protocol's mechanism, it must ask, "Could this go wrong?"
3. The AI Must Engage in Dialogues with Humans. An ideal AI audit process is a conversation between the AI and its human counterparts (developers or auditors), since the code itself never tells the whole story.
- When the AI is uncertain about the system's intended behavior, it must ask the developer for clarification.
- A human auditor can provide inspiration to the AI ("What if this scenario happens?" or "Is this kind of attack possible here?").
- A human can challenge the AI's reasoning ("Are you sure this part of the proof is correct? Why?"), forcing it to provide a deeper explanation.
- Crucially, the AI must be able to ask for help. There will be times when it can neither prove correctness nor produce a concrete exploit (even after fuzzing). These ambiguous areas are often the most critical parts of an audit. They require human insight to either complete the proof or find the flaw, which the AI can then verify. Here's an example where I (a human) provided a proof: https://github.com/Vectorized/solady/blob/main/audits/xuwinnie-solady-cbrt-proof.pdf; and here's an example where a counter-example is constructed: https://github.com/code-423n4/2023-05-maia-findings/issues/435
4. Use Tools. On top of the natural language as the proof's framework, the AI can use formal verification to validate a small or atomic assertion; when a bug appears, the AI can simulate a real environment and run the PoC.
Given the vast public databases of known vulnerabilities, there's a straightforward way to test if a framework like this is powerful enough: back-test it. Can it, either independently or with reasonable human guidance, rediscover these known bugs?
I'm eager to see more research progress in this direction. If you have other ideas, Iād love to hear them in the comments.