Key points are not available for this paper at this time.
Microbial enzyme production and catalysis systems are crucial aspect of biotechnological research. However, building them from trustworthy published experimental data presents a major obstacle for both manual and automated techniques. Here, we introduce MEPAM ( M icrobial E nzyme P roduction and Catalytic A ctivity based on LL M ), a question-answering system designed to accurately address inquiries related to enzyme production and catalytic reactions. Specifically, by training three machine learning models with > 0.98 accuracy, we identified 11,068 high-quality, relevant articles from the Web of Science. Leveraging DeepSeek-V3 with zero-shot learning, we developed an ontology-driven knowledge representation that extracted 12,434 entities and 35,918 relations with 0.78 extraction accuracy and constructed a structured knowledge graph. Compared to few-shot learning and other machine learning methods, our framework achieved significantly higher extraction accuracy. Using this framework, we developed MEPAM based on retrieval-augmented generation and prompt engineering. Finally, using MEPAM, we extracted a comprehensive network involving the expression profiles, precise culture conditions, and substrate preferences for cellulase, demonstrating the strong utility of this tool. Compared with traditional LLMs, particularly GPT-4o, MEPAM exhibited superior performance, achieving significantly higher answer accuracy (0.86 vs. 0.52) and nearly eliminating hallucinations. MEPAM is available at http://180.76.108.212 . This framework provides context-rich, verifiable insights, thus bridging predictive modeling with experimental validation to facilitate the exploration of microbial enzymatic systems.
Tong et al. (Fri,) studied this question.