Chain of Targeted Verification Questions to Improve the Reliability of Code Generated by LLMs (AIware 2024 - Main Track)

Who

Sylvain Kouemo Ngassom, Arghavan Moradi Dakhel, Florian Tambon, Foutse Khomh

Track

AIware 2024 Main Track

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 16 Jul 2024 11:50 - 12:00 at Mandacaru - Industry Talk3 + AIware for Code Chair(s): Yiling Lou

Abstract

LLM-based assistants, such as GitHub Copilot and ChatGPT, have the potential to generate code that fulfills a programming task described in a natural language description, referred to as a prompt. The widespread accessibility of these assistants enables users with diverse backgrounds to generate code and integrate it into software projects. However, studies show that code generated by LLMs is prone to bugs and may miss various corner cases in task specifications. Presenting such buggy code to users can impact their reliability and trust in LLM-based assistants. Moreover, significant efforts are required by the user to detect and repair any bug present in the code, especially if no test cases are available. In this study, we propose a self-refinement method aimed at improving the reliability of code generated by LLMs by minimizing the number of bugs before execution, without human intervention, and in the absence of test cases. Our approach is based on targeted Verification Questions (VQs) to identify potential bugs within the initial code. These VQs target various nodes within the Abstract Syntax Tree (AST) of the initial code, which have the potential to trigger specific types of bug patterns commonly found in LLM-generated code. Finally, our method attempts to repair these potential bugs by re-prompting the LLM with the targeted VQs and the initial code. Our evaluation, based on programming tasks in the CoderEval dataset, demonstrates that our proposed method outperforms state-of-the-art methods by decreasing the number of targeted errors in the code between 21% to 62% and improving the number of executable code instances to 13%.

DOI

https://doi.org/10.1145/3664646.3664772

Sylvain Kouemo Ngassom

Polytechnique Montréal

Canada

Arghavan Moradi Dakhel

Polytechnique Montreal

Canada

Florian Tambon

Polytechnique Montréal

Canada

Foutse Khomh

Polytechnique Montréal

Canada

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 16 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30	Industry Talk3 + AIware for CodeMain Track / Industry Statements and Demo Track at Mandacaru Chair(s): Yiling Lou Fudan University

11:00 20m Industry talk		AI-assisted User Intent Formalization for Programs: Problem and Applications Industry Statements and Demo Track Shuvendu K. Lahiri Microsoft Research
11:20 10m Paper		Identifying the Factors That Influence Trust in AI Code Completion Main Track Adam Brown Google, Sarah D'Angelo Google, Ambar Murillo Google, Ciera Jaspan Google, Collin Green Google DOI
11:30 10m Paper		A Transformer-Based Approach for Smart Invocation of Automatic Code Completion Main Track Aral de Moor Delft University of Technology, Arie van Deursen Delft University of Technology, Maliheh Izadi Delft University of Technology DOI
11:40 10m Paper		Leveraging Machine Learning for Optimal Object-Relational Database Mapping in Software Systems Main Track Sasan Azizian University of Nebraska-Lincoln, Elham Rastegari Creighton University, Hamid Bagheri University of Nebraska-Lincoln DOI
11:50 10m Paper		Chain of Targeted Verification Questions to Improve the Reliability of Code Generated by LLMs Main Track Sylvain Kouemo Ngassom Polytechnique Montréal, Arghavan Moradi Dakhel Polytechnique Montreal, Florian Tambon Polytechnique Montréal, Foutse Khomh Polytechnique Montréal DOI
12:00 30m Live Q&A		Session Q&A and topic discussions Main Track