A Comparative Analysis of Large Language Models for Code Documentation Generation (AIware 2024 - Main Track)

Who

Shubhang Shekhar Dvivedi, Vyshnav Vijay, Sai Leela Rahul Pujari, Shoumik Lodh, Dhruv Kumar

Track

AIware 2024 Main Track

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Tue 16 Jul 2024 14:20 - 14:30 at Mandacaru - Industry Talk4 + AIware for Software Lifecycle Activities Chair(s): Filipe Cogo

Abstract

This paper presents a comprehensive comparative analysis of Large Language Models (LLMs) for generation of code documentation. Code documentation is an essential part of the software writing process. The paper evaluates models such as GPT-3.5, GPT-4, Bard, Llama2, and StarChat on various parameters like Accuracy, Completeness, Relevance, Understandability, Readability and Time Taken for different levels of code documentation. Our evaluation employs a checklist-based system to minimize subjectivity, providing a more objective assessment. We find that, barring StarChat, all LLMs consistently outperform the original documentation. Notably, closed-source models GPT-3.5, GPT-4, and Bard exhibit superior performance across various parameters compared to open-source/source-available LLMs, namely Llama 2 and StarChat. Considering the time taken for generation, GPT-4 demonstrated the longest duration by a significant margin, followed by Llama2, Bard, with GPT-3.5 and StarChat having comparable generation times. Additionally, file level documentation had a considerably worse performance across all parameters (except for time taken) as compared to inline and function level documentation.

DOI

https://doi.org/10.1145/3664646.3664765

Shubhang Shekhar Dvivedi

IIIT Delhi

India

Vyshnav Vijay

IIIT Delhi

India

Sai Leela Rahul Pujari

IIIT Delhi

India

Shoumik Lodh

IIIT Delhi

India

Dhruv Kumar

Indraprastha Institute of Information Technology, Delhi

India

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Tue 16 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30	Industry Talk4 + AIware for Software Lifecycle ActivitiesMain Track / Industry Statements and Demo Track / Late Breaking Arxiv Track at Mandacaru Chair(s): Filipe Cogo Centre for Software Excellence, Huawei Canada

14:00 20m Industry talk		AI in Software Engineering at Google: Progress and the Path Ahead Industry Statements and Demo Track Satish Chandra Google, Inc
14:20 10m Paper		A Comparative Analysis of Large Language Models for Code Documentation Generation Main Track Shubhang Shekhar Dvivedi IIIT Delhi, Vyshnav Vijay IIIT Delhi, Sai Leela Rahul Pujari IIIT Delhi, Shoumik Lodh IIIT Delhi, Dhruv Kumar Indraprastha Institute of Information Technology, Delhi DOI
14:30 10m Paper		AI-Assisted Assessment of Coding Practices in Modern Code Review Main Track Manushree Vijayvergiya Google, Malgorzata Salawa Google, Ivan Budiselic Google, Dan Zheng Google DeepMind, Pascal Lamblin Google, Marko Ivanković Google; Universität Passau, Juanjo Carin Google, Mateusz Lewko Google Inc, Jovan Andonov Google, Goran Petrović Google Inc, Danny Tarlow Google, Petros Maniatis Google DeepMind, René Just University of Washington DOI
14:40 10m Paper		The Role of Generative AI in Software Development Productivity: A Pilot Case Study Main Track Mariana Coutinho CESAR School, Lorena Marques CESAR School, Anderson Santos CESAR School, Marcio Dahia CESAR School, Cesar França CESAR School, Ronnie de Souza Santos University of Calgary DOI
14:50 10m Paper		Effectiveness of ChatGPT for Static Analysis: How Far Are We? Main Track Mohammad Mahdi Mohajer York University, Reem Aleithan York University, Canada, Nima Shiri Harzevili York University, Moshi Wei York University, Alvine Boaye Belle York University, Hung Viet Pham York University, Song Wang York University DOI
15:00 5m Paper		Addressing Compiler Errors: Stack Overflow or Large Language Models? Late Breaking Arxiv Track Patricia Widjojo The University of Melbourne, Christoph Treude Singapore Management University Pre-print
15:05 25m Live Q&A		Session Q&A and topic discussions Main Track