Efficient carbon capture requires sorbents that combine high CO2 affinity, stability, and low regeneration energy. While metal–organic frameworks (MOFs) are promising candidates, their efficient screening remains a significant challenge: performance is governed by crystal topology, yet relevant data is scattered across the literature, and conventional experimental or computational methods are time-intensive and data-limited. To address this, we introduce MOFMeld, a structure–language fusion framework that integrates a literature-grounded, MOF-specialized large language model (MOFLLaMA) with crystal-aware structural embeddings via a lightweight bridge module. MOFLLaMA is adapted from LLaMA-3.1-8B-Instruct by supervised fine-tuning on ~20,000 MOF question-answer pairs distilled from ~1500 publications and, at inference, is grounded by a MOF knowledge graph to support factual, traceable reasoning. Structural information is encoded from CIF files and aligned to the language space, enabling structure-conditioned question answering and property prediction. Evaluated across six key targets—pore-limiting diameter, largest cavity diameter, surface area, void fraction, and CO2 uptake at 2.5 and 0.01 bar—MOFMeld achieves competitive or superior accuracy to a strong graph neural network (GNN) baseline despite substantially less training data. UMAP analyses reveal coherent organization of structure–property relationships within the learned embeddings, enhancing model interpretability. An automated literature pipeline further enables continual knowledge updates. Collectively, MOFMeld offers a scalable and transparent pathway toward literature-aware, structure-informed MOF screening for carbon capture applications.
You et al. (Tue,) studied this question.