Wednesday, June 4, 2025
LBNN
  • Business
  • Markets
  • Politics
  • Crypto
  • Finance
  • Energy
  • Technology
  • Taxes
  • Creator Economy
  • Wealth Management
  • Documentaries
No Result
View All Result
LBNN

AI-Powered Proof Generator Helps Debug Software

Simon Osuji by Simon Osuji
January 25, 2024
in Artificial Intelligence
0
AI-Powered Proof Generator Helps Debug Software
0
SHARES
2
VIEWS
Share on FacebookShare on Twitter


Not all software is perfect—many apps, programs, and websites are released despite bugs. But the software behind critical systems like cryptographic protocols, medical devices, and space shuttles must be error-free, and ensuring the absence of bugs requires going beyond code reviews and testing. It requires formal verification.

Formal verification involves writing a mathematical proof of your code and is “one of the hardest but also most powerful ways of making sure your code is correct,” says Yuriy Brun, a professorat the University of Massachusetts Amherst.

To make formal verification easier, Brun and his colleagues devised a new AI-powered method named Baldur to automatically generate proofs. The accompanying paper, presented in December 2023 at the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering in San Francisco, won a Distinguished Paper award. The team includes Emily First, who completed the study as part of her doctoral dissertation at UMass Amherst; Markus Rabe, a former researcher at Google, where the study was conducted; and Talia Ringer, an assistant professor at the University of Illinois Urbana-Champaign.

text with arrows and boxes This flowchart shows at a high level how Baldur generates a proof. University of Massachusetts/ University of Illinois Urbana-Champaign

Related posts

Adjustable Mattress vs. Adjustable Frame: Similar but Not the Same

Adjustable Mattress vs. Adjustable Frame: Similar but Not the Same

June 4, 2025
AI can help cut down on waste, improve quality in dyed fabrics

AI can help cut down on waste, improve quality in dyed fabrics

June 4, 2025

Baldur is powered by Google’s Minerva large language model (LLM)—which was trained on scientific papers and web pages containing mathematical expressions—and fine-tuned on data about proofs and theorems. Baldur works with Isabelle, a proof assistant or automated theorem prover,to check its proofs. When given a theorem statement, Baldur is able to generate an entire proof almost 41 percent of the time.

To boost Baldur’s success, the team fed the model additional contextual information—such as other definitions or the lines preceding the theorem statement in a theory file—and found that the proof rate increased to 47.5 percent. This means that Baldur is able to take the context and use it to predict a new correct proof, says First. Similar to a programmer who may be better equipped to fix a bug in a method when they know how that method relates to its surrounding code and the other methods in the same class, Baldur can perform better with extra context.

Thor, the current state-of-the-art tool for automatic proof generation, has a higher proof rate at 57 percent. Baldur’s advantage lies in its ability to generate whole proofs; Thor predicts the next step in a proof using a smaller language model combined with a method that searches the space of possible proofs. But when Thor and Baldur (Thor’s brother in Norse mythology) work together, the pair can generate correct proofs nearly 66 percent of the time.

The team also discovered that Baldur can “repair” its own proofs, further improving its proof rate. When provided with its previous failed attempt plus the error message returned by Isabelle, Baldur can turn its wrong proof into a right one.

“The fact that the error message helped so much was surprising,” Brun says. “[It] suggests that there’s more useful information that could potentially be fed into the large language model to give better answers. We’re just scratching the surface.”

The team has yet to find the right amount of information that would be deemed valuable for the model. “One limitation is that we’re giving it some information around the proof that it’s trying to generate, but we don’t know what’s the limit and where it stops being useful,” says Brun. And there’s still a considerable degree of error, which he hopes can be improved by fine-tuning the model using more datasets that better explain what a proof looks like and what a theorem and proof pair look like.

text with different color highlights, arrows and boxesThis flowchart shows how the proof repair model creates new training data for Baldur.University of Massachusetts/ University of Illinois Urbana-Champaign

In terms of next steps, First is looking at implementing the approach used for Baldur in other proof assistants, as well as uncovering ways to more smartly gather contextual information that can help enhance the model’s accuracy. The team envisions Baldur helping simplify the jobs of proof engineers, who are tasked with, for example, formal verification of national security systems at organizations like the U.S. Department of Defense and its Defense Advanced Research Projects Agency (DARPA).

On a broader scale, the team is planning on getting feedback from software developers to see how tools like Baldur can help them—be it through debugging an error in their code, refining specifications, or creating a higher-quality system.

“There’s a lot of power in building interactive tools where a developer is trying to prove some property of their code,” Brun says. “Understanding how developers can interact with these tools and supporting them by building automated approaches can take them even farther.”



Source link

Previous Post

Hywind maintenance a ‘lost opportunity’ for UK supply chain

Next Post

Defence Reserves without a number one

Next Post
Defence Reserves without a number one

Defence Reserves without a number one

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

RECOMMENDED NEWS

Solana (SOL) Transaction Volume Up Record 700% as $200 Nears

Solana (SOL) Transaction Volume Up Record 700% as $200 Nears

1 year ago
Seagreen wind farm now fully operational=

Seagreen wind farm now fully operational=

2 years ago
When Could ADA Reach $3, $7.03, and $10.40?

When Could ADA Reach $3, $7.03, and $10.40?

1 year ago
Top 10 African countries with the largest military aircraft fleet in 2025

Top 10 African countries with the largest military aircraft fleet in 2025

5 months ago

POPULAR NEWS

  • Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    Ghana to build three oil refineries, five petrochemical plants in energy sector overhaul

    0 shares
    Share 0 Tweet 0
  • When Will SHIB Reach $1? Here’s What ChatGPT Says

    0 shares
    Share 0 Tweet 0
  • Matthew Slater, son of Jackson State great, happy to see HBCUs back at the forefront

    0 shares
    Share 0 Tweet 0
  • Dolly Varden Focuses on Adding Ounces the Remainder of 2023

    0 shares
    Share 0 Tweet 0
  • US Dollar Might Fall To 96-97 Range in March 2024

    0 shares
    Share 0 Tweet 0
  • Privacy Policy
  • Contact

© 2023 LBNN - All rights reserved.

No Result
View All Result
  • Home
  • Business
  • Politics
  • Markets
  • Crypto
  • Economics
    • Manufacturing
    • Real Estate
    • Infrastructure
  • Finance
  • Energy
  • Creator Economy
  • Wealth Management
  • Taxes
  • Telecoms
  • Military & Defense
  • Careers
  • Technology
  • Artificial Intelligence
  • Investigative journalism
  • Art & Culture
  • Documentaries
  • Quizzes
    • Enneagram quiz
  • Newsletters
    • LBNN Newsletter
    • Divergent Capitalist

© 2023 LBNN - All rights reserved.