What question did this study set out to answer?

The aim is to develop a framework that enhances data mining from unstructured scientific literature for chemical discovery.

March 5, 2026Open Access

ReactionSeek: LLM-powered literature data mining and knowledge discovery in organic synthesis

Key Points

The aim is to develop a framework that enhances data mining from unstructured scientific literature for chemical discovery.
Combining large language models with cheminformatics tools
Utilizing sophisticated prompt engineering for data extraction
Validating the framework on the Organic Syntheses collection
Achieved over 95% precision and recall for extracting key reaction parameters
Generated a large AI-ready dataset for chemical information
Developed an interactive Synthetic Chatbot for querying chemical data

Abstract

The application of artificial intelligence (AI) to chemical discovery is critically hindered by the inaccessibility of data locked within unstructured scientific literature. Existing data acquisition methods are often manual, limited in scope, or require extensive custom software development, impeding progress in leveraging AI for chemical discovery. Here, we introduce ReactionSeek, a framework that synergistically combines large language models (LLMs) with established cheminformatics tools to automate multi-modal data mining from organic synthesis literature. Through sophisticated prompt engineering with minimal custom code, ReactionSeek extracts and standardizes complex textual, graphical, and semantic chemical information. We validate this framework on the century-spanning Organic Syntheses collection, achieving over 95% precision and recall for key reaction parameters. This enables three applications: the generation of a large, AI-ready dataset; an interactive Synthetic Chatbot (SynChat) for natural language querying of chemical data; and an autonomous analysis that revealed decades-long trends in catalysis. ReactionSeek thus provides a general solution to the data curation bottleneck, representing a step forward in for AI-driven archive mining and knowledge discovery across the chemical sciences.

Bookmark

View Full Paper

Cite This Study

Li et al. (Mon,) studied this question.

synapsesocial.com/papers/69a91d21d6127c7a504bfecb https://doi.org/https://doi.org/10.1038/s41467-026-70180-1

Bookmark

View Full Paper