We developed pinax, a provenance management system that serves as an integrated platform for materials data science. The system addresses three key challenges of applying machine learning to materials research: (i) the complexity of analytical workflows, (ii) the requirement for reproducibility, and (iii) effective knowledge sharing. The system records each step, from raw data ingestion and preprocessing to model training, evaluation, and prediction comprehensively. By representing these analytical processes as ‘provenance information’ in a graph-based structure, pinax allows systematic accumulation and reuse of analytical knowledge, which facilitates more efficient data-centric materials research. Furthermore, pinax integrates with the key analytical functions and databases of the NIMS (National Institute for Materials Science) data integration and utilization platform, thereby contributing to the establishment of an integrated infrastructure for materials data science. This integration advances the digital transformation (DX) of materials science by enabling more effective and seamless utilization of data across research activities. The effectiveness of the system was demonstrated through case studies of machine learning applications in materials engineering.
Minamoto et al. (Sun,) studied this question.