Agents for Data Science: From Raw Data to AI-generated Notebooks Using LLMs and Code Execution (AIware 2024 - Industry Statements and Demo Track)

Track

AIware 2024 Industry Statements and Demo Track

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

By setting a time band, the program will dim events that are outside this time window. This is useful for (virtual) conferences with a continuous program (with repeated sessions).
The time band will also limit the events that are included in the personal iCalendar subscription service.

Display full programSpecify a time band

Save

When

Mon 15 Jul 2024 11:00 - 11:20 at Mandacaru - Industry Talk1 + SE for AIware Chair(s): Andreas Zeller

Abstract

Data science tasks involve a complex interplay of datasets, code and code outputs for answering questions, deriving insights, or building models from data. Tasks and chosen methods may require specialized data domain or scientific domain knowledge. Queries range from high-level (low-code) or highly technical (high-code). Code execution results, such as plots and tables are artifacts used by data scientists to interpret and reason about the current and future states of a solution towards completing the task. This presents unique challenges in designing, deploying and evaluating LLM-based agents for automating data science workflows. In this talk we will introduce an end-to-end, autonomous Data Science Agent (DSA) built around Gemini and available as an experiment at labs.google/code. DSA leverages agentic flows, planning and orchestration to tackle open-ended data science explorations. It uses LLMs for planning, task decomposition, code generation, reasoning and error-correction through code execution. DSA is designed to streamline the entire data science process, enabling users to query data in natural language, and get from a dataset and prompt to a fully AI-generated, populated notebook. We’ll discuss design choices (prompting, SFT, orchestration), iterative development cycles, evaluation, lessons learned and future challenges. Where applicable, we will showcase real-world case studies demonstrating how DSA can assist with bootstrapping the analysis of data from complex scientific domains.

Time Zone

The program is currently displayed in (GMT-03:00) Brasilia, Distrito Federal, Brazil.

Use conference time zone: (GMT-03:00) Brasilia, Distrito Federal, BrazilSelect other time zone

The GMT offsets shown reflect the offsets at the moment of the conference.

Time Band

Display full programSpecify a time band

Save

Session Program

Mon 15 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change

11:00 - 12:30	Industry Talk1 + SE for AIwareLate Breaking Arxiv Track / Industry Statements and Demo Track / Main Track at Mandacaru Chair(s): Andreas Zeller CISPA Helmholtz Center for Information Security

11:00 20m Industry talk		Agents for Data Science: From Raw Data to AI-generated Notebooks Using LLMs and Code Execution Industry Statements and Demo Track Jiahao Cai Google
11:20 10m Paper		Function+Data Flow: A Framework to Specify Machine Learning Pipelines for Digital Twinning Main Track Eduardo de Conto Nanyang Technological University; CNRS@CREATE, Blaise Genest IPAL - CNRS - CNRS@CREATE, Arvind Easwaran Nanyang Technological University DOI Pre-print
11:30 10m Paper		Green AI in Action: Strategic Model Selection for Ensembles in Production Main Track Nienke Nijkamp Delft University of Technology, June Sallou Delft University of Technology, Niels van der Heijden University of Amsterdam, Luís Cruz Delft University of Technology DOI Pre-print
11:40 5m Paper		Towards Responsible AI in the Era of Generative AI: A Reference Architecture for Designing Foundation Model based Systems Late Breaking Arxiv Track Qinghua Lu Data61, CSIRO, Liming Zhu CSIRO’s Data61, Xiwei (Sherry) Xu Data61, CSIRO, Zhenchang Xing CSIRO’s Data61; Australian National University, Jon Whittle CSIRO's Data61 and Monash University Pre-print
11:45 5m Paper		Towards Responsible Generative AI: A Reference Architecture for Designing Foundation Model based Agents Late Breaking Arxiv Track Qinghua Lu Data61, CSIRO, Liming Zhu CSIRO’s Data61, Xiwei (Sherry) Xu Data61, CSIRO, Zhenchang Xing CSIRO’s Data61; Australian National University, Stefan Harrer CSIRO's Data61, Jon Whittle CSIRO's Data61 and Monash University Pre-print
11:50 5m Paper		Agent Design Pattern Catalogue: A Collection of Architectural Patterns for Foundation Model based Agents Late Breaking Arxiv Track Yue Liu Data61, CSIRO, Sin Kit Lo CSIRO Data61, Qinghua Lu Data61, CSIRO, Liming Zhu CSIRO’s Data61, Dehai Zhao CSIRO's Data61, Xiwei (Sherry) Xu Data61, CSIRO, Stefan Harrer CSIRO's Data61, Jon Whittle CSIRO's Data61 and Monash University Pre-print
11:55 35m Live Q&A		Session Q&A and topic discussions Main Track

Agents for Data Science: From Raw Data to AI-generated Notebooks Using LLMs and Code Execution

Program Display Configuration

Program Display Configuration

Mon 15 JulDisplayed time zone: Brasilia, Distrito Federal, Brazil change

Jiahao Cai

Google

Mon 15 Jul
Displayed time zone: Brasilia, Distrito Federal, Brazil change