AIware 2024
Mon 15 - Tue 16 July 2024 Porto de Galinhas, Brazil, Brazil
co-located with FSE 2024

Recent work in automated program repair (APR) proposes the use of reasoning and patch validation feedback to reduce the semantic gap between the LLMs and the code under analysis. The idea has been shown to perform well for general APR, but its effectiveness in other particular contexts remains underexplored.

In this work, we assess the impact of reasoning and patch validation feedback to LLMs in the context of vulnerability repair, an important and challenging task in security. To support the evaluation, we present VRpilot, an LLM-based vulnerability repair technique based on reasoning and patch validation feedback. VRpilot (1) uses a chain-of-thought prompt to reason about a vulnerability prior to generating patch candidates and (2) iteratively refines prompts according to the output of external tools (e.g., compiler, code sanitizers, test suite, etc.) on previously generated patches.

To evaluate performance, we compare VRpilot against the state-of-the-art vulnerability repair techniques for C and Java using public datasets from the literature. Our results show that VRpilot generates, on average, 14% and 7.6% more correct patches than the baseline techniques on C and Java, respectively. We show, through an ablation study, that reasoning and patch validation feedback are critical. We report several lessons from this study and potential directions for advancing LLM-empowered vulnerability repair.

Mon 15 Jul

Displayed time zone: Brasilia, Distrito Federal, Brazil change

16:00 - 18:00
Security and Safety + Round Table + Day1 ClosingMain Track / Late Breaking Arxiv Track at Mandacaru
Chair(s): Thomas Zimmermann Microsoft Research, Ahmed E. Hassan Queen’s University
16:00
5m
Paper
An AI System Evaluation Framework for Advancing AI Safety: Terminology, Taxonomy, Lifecycle Mapping
Main Track
Boming Xia CSIRO's Data61 & University of New South Wales, Qinghua Lu Data61, CSIRO, Liming Zhu CSIRO’s Data61, Zhenchang Xing CSIRO's Data61
DOI
16:05
5m
Paper
Measuring Impacts of Poisoning on Model Parameters and Embeddings for Large Language Models of Code
Main Track
Aftab Hussain University of Houston, Md Rafiqul Islam Rabin University of Houston, Amin Alipour University of Houston
DOI
16:10
10m
Paper
A Case Study of LLM for Automated Vulnerability Repair: Assessing Impact of Reasoning and Patch Validation Feedback
Main Track
Ummay Kulsum North Carolina State University, Haotian Zhu Singapore Management University, Bowen Xu North Carolina State University, Marcelo d'Amorim North Carolina State University
DOI
16:20
5m
Paper
Trojans in Large Language Models of Code: A Critical Review through a Trigger-Based Taxonomy
Late Breaking Arxiv Track
Aftab Hussain University of Houston, Md Rafiqul Islam Rabin University of Houston, Toufique Ahmed University of California at Davis, Bowen Xu North Carolina State University, Premkumar Devanbu UC Davis, Amin Alipour University of Houston
Pre-print
16:25
25m
Live Q&A
Session Q&A and topic discussions
Main Track

16:50
60m
Panel
Round Table
Main Track

17:50
10m
Day closing
Day 1 summary and closing
Main Track