What question did this study set out to answer?

The central aim is to improve the screening of metal-organic frameworks (MOFs) for carbon capture applications by integrating structural and linguistic data.

April 24, 2026Open Access

MOFMeld: a structure–language fusion framework for MOF property prediction in carbon capture

Key Points

The central aim is to improve the screening of metal-organic frameworks (MOFs) for carbon capture applications by integrating structural and linguistic data.
Developed a structure-language fusion framework called MOFMeld.
Utilized a specialized language model (MOFLLaMA) trained on 20,000 MOF-related question-answer pairs.
Encoded structural information from CIF files and aligned it with a MOF knowledge graph.
MOFMeld achieved competitive or superior accuracy compared to a strong graph neural network baseline.
Demonstrated organized structure-property relationships within learned embeddings through UMAP analyses.
Enabled continual updates to knowledge via an automated literature pipeline.

Abstract

Efficient carbon capture requires sorbents that combine high CO2 affinity, stability, and low regeneration energy. While metal–organic frameworks (MOFs) are promising candidates, their efficient screening remains a significant challenge: performance is governed by crystal topology, yet relevant data is scattered across the literature, and conventional experimental or computational methods are time-intensive and data-limited. To address this, we introduce MOFMeld, a structure–language fusion framework that integrates a literature-grounded, MOF-specialized large language model (MOFLLaMA) with crystal-aware structural embeddings via a lightweight bridge module. MOFLLaMA is adapted from LLaMA-3.1-8B-Instruct by supervised fine-tuning on ~20,000 MOF question-answer pairs distilled from ~1500 publications and, at inference, is grounded by a MOF knowledge graph to support factual, traceable reasoning. Structural information is encoded from CIF files and aligned to the language space, enabling structure-conditioned question answering and property prediction. Evaluated across six key targets—pore-limiting diameter, largest cavity diameter, surface area, void fraction, and CO2 uptake at 2.5 and 0.01 bar—MOFMeld achieves competitive or superior accuracy to a strong graph neural network (GNN) baseline despite substantially less training data. UMAP analyses reveal coherent organization of structure–property relationships within the learned embeddings, enhancing model interpretability. An automated literature pipeline further enables continual knowledge updates. Collectively, MOFMeld offers a scalable and transparent pathway toward literature-aware, structure-informed MOF screening for carbon capture applications.

Mark Helpful

Bookmark

Relay

View Full Paper