What type of study is this?

September 10, 2025Open Access

A Generalized Framework for Data‐Efficient and Extrapolative Materials Discovery for Gas Separation

Key Points

Identifying materials with only 5%-10% of the total training data improves discovery efficiency.
Models successfully predict up to 97 out of the top 100 best performing materials in gas separation.
Innovative framework relies on SML while exhibiting minimal dependence on specific properties.
Methodology is demonstrated across multiple metal organic framework databases, indicating robustness.

Abstract

Supervised machine learning (SML) has woven itself into the very fabric of material discovery, as it offers computationally cheaper ways of correlating the structure of a material with its properties, which, in SML's absence, would require high fidelity, resource intensive first principles calculations. The performance of SML models is strongly influenced by the quantity of available training data. In general, an increase in the amount of training data leads to an improvement in model accuracy. When adequately trained, these SML models act as effective low fidelity surrogate models for accelerating materials discovery, aligning with the broader objective of computational materials science, which is the identification of high‐performing materials for a variety of target applications. In this work, we recognize the importance of data driven model accuracy and introduce a novel framework for constructing SML models aimed at identifying top performing materials for gas separation applications. Our approach embraces the challenge of data scarcity, seeking to discover as many high performing candidates as possible while relying on minimal training data. We demonstrate that our iterative framework for building SML models reduces the required training dataset to only 5%–10% of the total data, while successfully identifying up to 97 of the top 100 best performing materials. Furthermore, we show that this framework is weakly SML model dependent, exhibiting minimal dependence on the specific target property under investigation. Leveraging this innovative approach, we identify top performing candidates for three industry relevant gas separations in multiple metal organic framework databases, thereby highlighting the robustness and general applicability of our workflow.

A Generalized Framework for Data‐Efficient and Extrapolative Materials Discovery for Gas Separation

Key Points

Abstract

Cite This Study