Metal-organic frameworks (MOFs) offer vast potential for numerous applications but their deceptively simple synthesis remains poorly understood. This fact hinders translating proposed MOF designs from computational high-throughput screening methods into successful synthesis, and slows down MOF development in general. In addition, much of the practical chemical intuition on MOF synthesis remains embedded in natural language across thousands of MOF reports. These limitations call for research exploring data-driven understanding of MOF synthesis and MOF synthesis automation. This paper reports on novel research developing an MOF expert-guided framework that leverages large language models (LLMs) to extract and codify synthesis procedures of metal-organic frameworks (MOFs) in a sequence-aware manner from experimental literature. Specifically, we developed an end-to-end pipeline that combines literature matching, synthesis paragraph classification, and prompt-based entity and relation extraction using GPT-4. Guided by MOF experts, we designed a comprehensive and FAIR-compliant synthesis codification schema that captures synthesis actions, precursors, conditions, and their interrelations as a sequence-aware directed graph. Our model achieves high accuracy in synthesis paragraph classification (F1 score: 0.93) and in entity and relation extraction (F1 score: 0.96 and 0.94, respectively). This work enables large-scale, structured synthesis data collection and paves the way for AI-assisted synthesis prediction and knowledge discovery in materials science.
Zhao et al. (Tue,) studied this question.