This document reports on our experience of building an “agentic AI” (Artificial Intelligence) that helps a human to answer logical questions in a trustworthy way. This agent combines a Large Language Model (LLM) (which interacts with the human in natural language) with a logical software (which automatically proves formal theorems). The LLM engages in a dialogue with the human in order to translate their logical question from natural language to a formal proof problem. Once the human is satisfied with the formalization, the LLM invokes the prover to automatically solve the problem and thus answer the question; then the LLM also offers the user the possibility to inspect the successful proof or the unsuccessful proof attempt by calling the prover in an interactive mode. Furthermore, we describe how much of the source code (which is based on on the agent construction framework LangChain) has been “vibe coded”, i.e., itself generated with the help of an LLM.
Wolfgang Schreiner (Thu,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: