Effectiveness of ChatGPT for Static Analysis: How Far Are We? (AIware 2024 - Main Track)

Who

Mohammad Mahdi Mohajer, Reem Aleithan, Nima Shiri Harzevili, Moshi Wei, Alvine Boaye Belle, Hung Viet Pham, Song Wang

Track

AIware 2024 Main Track

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 16 Jul 2024 14:50 - 15:00 at Mandacaru - Industry Talk4 + AIware for Software Lifecycle Activities Chair(s): Filipe Cogo

Abstract

This paper conducted a novel study to explore the capabilities of ChatGPT, a state-of-the-art LLM, in static analysis tasks such as static bug detection and false positive warning removal. In our evaluation, we focused on two types of typical and critical bugs targeted by static bug detection, i.e., Null Dereference and Resource Leak, as our subjects. We employ Infer, a well-established static analyzer, to aid the gathering of these two bug types from 10 open-source projects. Consequently, our experiment dataset contains 222 instances of Null Dereference bugs and 46 instances of Resource Leak bugs. Our study demonstrates that ChatGPT can achieve remarkable performance in the mentioned static analysis tasks, including bug detection and false-positive warning removal. In static bug detection, ChatGPT achieves accuracy and precision values of up to 68.37% and 63.76% for detecting Null Dereference bugs and 76.95% and 82.73% for detecting Resource Leak bugs, improving the precision of the current leading bug detector, Infer by 12.86% and 43.13% respectively. For removing false-positive warnings, ChatGPT can reach a precision of up to 93.88% for Null Dereference bugs and 63.33% for Resource Leak bugs, surpassing existing state-of-the-art false-positive warning removal tools.

DOI

https://doi.org/10.1145/3664646.3664777

Mohammad Mahdi Mohajer

York University

Canada

Reem Aleithan

York University, Canada

Saudi Arabia

Nima Shiri Harzevili

York University

Canada

Moshi Wei

York University

Canada

Alvine Boaye Belle

York University

Canada

Hung Viet Pham

York University

Canada

Song Wang

York University

Canada

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 16 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30	Industry Talk4 + AIware for Software Lifecycle ActivitiesMain Track / Industry Statements and Demo Track / Late Breaking Arxiv Track at Mandacaru Chair(s): Filipe Cogo Centre for Software Excellence, Huawei Canada

14:00 20m Industry talk		AI in Software Engineering at Google: Progress and the Path Ahead Industry Statements and Demo Track Satish Chandra Google, Inc
14:20 10m Paper		A Comparative Analysis of Large Language Models for Code Documentation Generation Main Track Shubhang Shekhar Dvivedi IIIT Delhi, Vyshnav Vijay IIIT Delhi, Sai Leela Rahul Pujari IIIT Delhi, Shoumik Lodh IIIT Delhi, Dhruv Kumar Indraprastha Institute of Information Technology, Delhi DOI
14:30 10m Paper		AI-Assisted Assessment of Coding Practices in Modern Code Review Main Track Manushree Vijayvergiya Google, Malgorzata Salawa Google, Ivan Budiselic Google, Dan Zheng Google DeepMind, Pascal Lamblin Google, Marko Ivanković Google; Universität Passau, Juanjo Carin Google, Mateusz Lewko Google Inc, Jovan Andonov Google, Goran Petrović Google Inc, Danny Tarlow Google, Petros Maniatis Google DeepMind, René Just University of Washington DOI
14:40 10m Paper		The Role of Generative AI in Software Development Productivity: A Pilot Case Study Main Track Mariana Coutinho CESAR School, Lorena Marques CESAR School, Anderson Santos CESAR School, Marcio Dahia CESAR School, Cesar França CESAR School, Ronnie de Souza Santos University of Calgary DOI
14:50 10m Paper		Effectiveness of ChatGPT for Static Analysis: How Far Are We? Main Track Mohammad Mahdi Mohajer York University, Reem Aleithan York University, Canada, Nima Shiri Harzevili York University, Moshi Wei York University, Alvine Boaye Belle York University, Hung Viet Pham York University, Song Wang York University DOI
15:00 5m Paper		Addressing Compiler Errors: Stack Overflow or Large Language Models? Late Breaking Arxiv Track Patricia Widjojo The University of Melbourne, Christoph Treude Singapore Management University Pre-print
15:05 25m Live Q&A		Session Q&A and topic discussions Main Track