AI-assisted User Intent Formalization for Programs: Problem and Applications
The correctness of a program is always as good as the specification that is verified. However, writing down formal specifications is non-trivial for mainstream developers. This limits the usage of rigorous testing and verification techniques to improve the quality of code, including those generated by AI.
In this talk, I will describe ongoing work on deriving formal specifications from informal source of intent such as natural language docstrings, API documentation and RFC. We describe different ways to formalize the intent as declarative specifications starting with tests, postconditions (in mainstream languages such as Python to verification-aware languages such as Dafny), to data format specifications in 3D language. We formalize metrics to evaluate the quality of various LLM-based techniques using techniques inspired by mutation testing, but with novel encoding requiring mutant generation or invoking formal verifiers. We describe our current progress on creating benchmarks for these tasks.
We will focus on two applications of specification generation: first, we will demonstrate the use of postcondition generation for Java to discover several historical bugs in the Defects4J benchmark. Second, we demonstrate the use of agents for translating informal intent in RFCs into formal data-format-specifications in the 3D language, from which verified parsers can be constructed. We will outline the outstanding challenges in this area and hope to engage in active discussions with the participants of the PL/SE community.