Improving Bug Fixing with Large Language Models

A new method uses LLMs to enhance program repair efficiency.

2025-08-20T06:55:24+00:00 ― 5 min read

Table of Contents

Proposed Approach
Findings
Experiments and Results
Conclusion
Original Source
Reference Links

Program repair is a vital aspect of software development. Developers often spend a significant amount of their time fixing bugs. Many studies suggest that it can take up more than 35% of regular development time. To ease this workload, researchers have been working on Automatic Program Repair (APR) methods that aim to automate the bug-fixing process.

What is Automated Program Repair?

Automated Program Repair methods can be classified into several types, including heuristic-based, constraint-based, template-based, and learning-based methods. Each type has its own approach to generating code patches, which are the modifications applied to fix bugs in software code. Traditional methods often rely on fixed patterns designed to handle specific bugs, while learning-based methods utilize large sets of data containing buggy and fixed code to train models that can suggest fixes.

The Role of Large Language Models

Recently, Large Language Models (LLMs) like GPT-4 have provided promising results for automated program repair. LLMs are models trained on vast amounts of text data and have capabilities that allow them to understand and generate code effectively. Researchers are increasingly turning to LLMs because they have shown strong performance in code comprehension and generation tasks.

Challenges with Current Approaches

Despite the advancements, current methods face two major challenges. First, there is a misalignment between the training objectives of LLMs and the goals of program repair methods. LLMs are generally trained to predict the next token in a sequence. However, many program repair methods expect them to fill in specific gaps within the code, leading to reduced performance. Second, existing workflows separate the steps of finding bugs and fixing them. This division can limit the potential for LLMs to generate effective solutions since they may miss opportunities to find broader fixes.

Proposed Approach

This work introduces a new method for program repair using LLMs. The key points of the proposed approach are:

Aligning the output of LLMs with their training objectives will enhance their ability to generate effective code fixes.
Allowing LLMs to refine entire functions instead of just focusing on specific sections leads to better program repair outcomes.

Framework Development

Based on these insights, a straightforward prompting framework was designed for automatic program repair, named Direct Debug Drives Decent Code. This framework allows LLMs to fix problems in code by generating a complete and refined version of a buggy program based on various inputs, such as related documents and error reports.

How It Works

The new program repair framework generates refined versions of buggy programs by using relevant information and Artifacts from error reports and failed tests. LLMs are prompted with these details to produce more accurate fixes without needing to localize faults first. The innovative aspect of this approach is that it focuses on the complete function instead of narrower, masked areas.

Findings

The findings indicate that the new method performs better than existing state-of-the-art program repair methods, achieving a 10% improvement over those that utilize fault localization. Additionally, the proposed approach requires significantly fewer samples for generating code patches, demonstrating efficiency in program repair processes.

Importance of Objective Alignment

One of the essential insights is that aligning the output of LLMs with their training objectives is necessary for optimal performance. This means that by generating complete functions rather than filling in incomplete sections, LLMs can utilize their training more effectively. The experiments showed that the framework could produce more accurate and reliable code patches when working with complete functions.

Significance of Artifacts

Another key component was the use of artifacts, which include documents describing the code and inputs and outputs of failed tests. These artifacts help guide the LLMs in refining their outputs and allow them to locate and repair bugs more effectively. The results revealed that using these additional inputs greatly improved the performance of the repair process.

Experiments and Results

Experiments were conducted on a variety of bugs to compare the proposed framework with other state-of-the-art methods. The comparisons showed that the new approach outperformed existing methods while requiring fewer samples for generating correct patches.

Evaluation Methods

The evaluation of the proposed method involved assessing its ability to generate correct patches for various programming bugs. The number of correct patches generated was taken into account, along with the methodology used for creating those patches. Each patch that successfully passed all required tests was considered correct.

Comparisons with Existing Methods

The results revealed that the new method not only generated a higher number of correct patches than previous models but also did so much more efficiently. While existing methods required numerous samples to achieve good results, the new approach only needed a fraction of that, showcasing its effectiveness.

Conclusion

The research highlights new strategies for using large language models in automated program repair. By aligning model outputs with training objectives and allowing for whole program refinement without prior fault localization, the proposed approach significantly enhances the effectiveness of program repair methods. This work lays the foundation for future research and encourages the reevaluation of traditional workflows in automated program repair. Through this innovative framework, the potential of LLMs in coding tasks can be better utilized, leading to more efficient and effective software development processes.

Improving Bug Fixing with Large Language Models

A new method uses LLMs to enhance program repair efficiency.

#What is Automated Program Repair?

#The Role of Large Language Models

#Challenges with Current Approaches

#Proposed Approach

#Framework Development

#How It Works

#Findings

#Importance of Objective Alignment

#Significance of Artifacts

#Experiments and Results

#Evaluation Methods

#Comparisons with Existing Methods

#Conclusion

Reference Links

Referenced Topics