Abstract In this research, our objectives are to contribute to the originality of the standardization of data quality measurement, the development of a genuine data structure for Machine Learning (ML), the alteration of design space, and a novel coefficient-based masking process that can optimize the target yield and to build a robust materials informatics (MI) platform. We gathered a matrix of 520 × 260 (260 multi-variant features) of laboratory data and conducted a series of experiments supported by various machine learning (ML) design patterns. Our research has focused on the application of informatics in materials science, supported by innovations in measurement and optimization processes. It also provides an overview of some of the recent successful data-driven “MI” strategies undertaken in this decade. Supported by experiments targeting the optimization of the yield, the research also identifies some challenges the community is facing and those that should be overcome shortly and streamlines a genuine process of MI. These areas include; data foundation, data quality, model evaluation, feature and cluster validation, and the optimization of target yield and likelihood.
Wu et al. (Sun,) studied this question.