RUBICON: Rubric-Based Evaluation of Domain-Specific Human AI Conversations (AIware 2024 - Main Track)

Who

Param Biyani, Yasharth Bajpai, Arjun Radhakrishna, Gustavo Soares, Sumit Gulwani

Track

AIware 2024 Main Track

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 15 Jul 2024 14:40 - 14:50 at Mandacaru - Industry Talk2 + Human AI Conversation Chair(s): Qinghua Lu

Abstract

Evaluating conversational assistants, such as GitHub Copilot Chat, poses a significant challenge for tool builders in the domain of Software Engineering. These assistants rely on language models and

chat-based user experiences, rendering their evaluation with respect to the quality of the Human-AI conversations complicated. Existing general-purpose metrics for measuring conversational quality found

in literature are inadequate for appraising domain-specific dialogues due to their lack of contextual sensitivity. In this paper, we present RUBICON, a technique for evaluating domain-specific Human-

AI conversations. RUBICON leverages large language models to generate candidate rubrics for assessing conversation quality and employs a selection process to choose the subset of rubrics based on their performance in scoring conversations. In our experiments, RUBICON effectively learns to differentiate conversation quality, achieving higher accuracy and yield rates than existing baselines.

DOI

https://doi.org/10.1145/3664646.3664778

Param Biyani

Microsoft

United States

Yasharth Bajpai

Microsoft

n.n.

Arjun Radhakrishna

Microsoft

United States

Gustavo Soares

Microsoft

United States

Sumit Gulwani

Microsoft

United States

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 15 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

14:00 - 15:30	Industry Talk2 + Human AI ConversationMain Track / Industry Statements and Demo Track at Mandacaru Chair(s): Qinghua Lu Data61, CSIRO

14:00 20m Industry talk		AI Assistant in JetBrains IDE: Insights and Challenges Industry Statements and Demo Track Andrey Sokolov JetBrains Research
14:20 10m Paper		Unveiling the Potential of a Conversational Agent in Developer Support: Insights from Mozilla’s PDF.js Project Main Track João Correia PUC-Rio, Morgan C. Nicholson University of São Paulo, Daniel Coutinho Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Caio Barbosa Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Marco Castelluccio Mozilla, Marco Gerosa Northern Arizona University, Alessandro Garcia Pontifical Catholic University of Rio de Janeiro (PUC-Rio), Igor Steinmacher Northern Arizona University DOI Pre-print
14:30 10m Paper		From Human-to-Human to Human-to-Bot Conversations in Software Engineering Main Track Ranim Khojah Chalmers \| University of Gothenburg, Francisco Gomes de Oliveira Neto Chalmers \| University of Gothenburg, Philipp Leitner Chalmers \| University of Gothenburg DOI Pre-print
14:40 10m Paper		RUBICON: Rubric-Based Evaluation of Domain-Specific Human AI Conversations Main Track Param Biyani Microsoft, Yasharth Bajpai Microsoft, Arjun Radhakrishna Microsoft, Gustavo Soares Microsoft, Sumit Gulwani Microsoft DOI
14:50 5m Paper		Unveiling Assumptions: Exploring the Decisions of AI Chatbots and Human Testers Main Track Francisco Gomes de Oliveira Neto Chalmers \| University of Gothenburg DOI
14:55 35m Live Q&A		Session Q&A and topic discussions Main Track