AIware 2024
Mon 15 - Tue 16 July 2024 Porto de Galinhas, Brazil, Brazil
co-located with FSE 2024
Mon 15 Jul 2024 14:40 - 14:50 at Mandacaru - Industry Talk2 + Human AI Conversation Chair(s): Qinghua Lu

Evaluating conversational assistants, such as GitHub Copilot Chat, poses a significant challenge for tool builders in the domain of Software Engineering. These assistants rely on language models and

chat-based user experiences, rendering their evaluation with respect to the quality of the Human-AI conversations complicated. Existing general-purpose metrics for measuring conversational quality found

in literature are inadequate for appraising domain-specific dialogues due to their lack of contextual sensitivity. In this paper, we present RUBICON, a technique for evaluating domain-specific Human-

AI conversations. RUBICON leverages large language models to generate candidate rubrics for assessing conversation quality and employs a selection process to choose the subset of rubrics based on their performance in scoring conversations. In our experiments, RUBICON effectively learns to differentiate conversation quality, achieving higher accuracy and yield rates than existing baselines.

Mon 15 Jul

Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30
Industry Talk2 + Human AI ConversationMain Track / Industry Statements and Demo Track at Mandacaru
Chair(s): Qinghua Lu Data61, CSIRO
14:00
20m
Industry talk
AI Assistant in JetBrains IDE: Insights and Challenges
Industry Statements and Demo Track
Andrey Sokolov JetBrains Research
14:20
10m
Paper
Unveiling the Potential of a Conversational Agent in Developer Support: Insights from Mozilla’s PDF.js Project
Main Track
João Correia PUC-Rio, Morgan C. Nicholson University of São Paulo, Daniel Coutinho Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Caio Barbosa Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Marco Castelluccio Mozilla, Marco Gerosa Northern Arizona University, Alessandro Garcia Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Igor Steinmacher Northern Arizona University
DOI Pre-print
14:30
10m
Paper
From Human-to-Human to Human-to-Bot Conversations in Software Engineering
Main Track
Ranim Khojah Chalmers | University of Gothenburg, Francisco Gomes de Oliveira Neto Chalmers | University of Gothenburg, Philipp Leitner Chalmers | University of Gothenburg
DOI Pre-print
14:40
10m
Paper
RUBICON: Rubric-Based Evaluation of Domain-Specific Human AI Conversations
Main Track
Param Biyani Microsoft, Yasharth Bajpai Microsoft, Arjun Radhakrishna Microsoft, Gustavo Soares Microsoft, Sumit Gulwani Microsoft
DOI
14:50
5m
Paper
Unveiling Assumptions: Exploring the Decisions of AI Chatbots and Human Testers
Main Track
Francisco Gomes de Oliveira Neto Chalmers | University of Gothenburg
DOI
14:55
35m
Live Q&A
Session Q&A and topic discussions
Main Track