What question did this study set out to answer?

The aim is to develop an effective framework for multiphase identification of crystalline phases in X-ray diffraction data using a machine learning approach.

April 26, 2026

XRD-VisionTransformer: Effective Multiphase Identification Framework for X-ray Diffraction Patterns

Key Points

The aim is to develop an effective framework for multiphase identification of crystalline phases in X-ray diffraction data using a machine learning approach.
Introduced the XRD-VisionTransformer (XViT) using a self-attention mechanism for global relationship modeling.
Integrated a statistical positional embedding module to enhance data performance on smaller sets.
Implemented a deep classifier tail to better learn interpeak dependencies within the final transformer layer.
XViT significantly outperformed traditional CNN and ViT models in phase identification tasks.
Demonstrated robustness in handling smaller data sets without compromising performance.
Comprehensive experiments validated the effectiveness of the proposed methodologies.

Abstract

X-ray diffraction (XRD) is a powerful analytical technique for identifying crystalline phases in unknown mixtures. However, traditional phase identification methods are time-consuming and require substantial human intervention. To accelerate this process, machine learning has become increasingly important in XRD phase identification. By framing phase identification as an image classification task, Convolutional Neural Network (CNN)-based methods have achieved notable performance. However, the inherent limitation of CNN on capturing long-range dependencies made it difficult to process multiphase identification tasks in which the characteristic peaks may be widely separated. The Vision Transformer (ViT) architecture, with its self-attention mechanism, offers a promising alternative by effectively modeling global relationships. However, the difference between XRD data and natural image limits ViT's model performance in the phase identification task. To align ViT architectures with XRD domain knowledge, we proposed the XRD-VisionTransformer (XViT), a new network for multiphase identification of XRD patterns. Additionally, to address ViT's sensitivity to datasize, we introduced a statistical positional embedding module in XViT that encodes crystallographic position priors using global intensity statistics rather than fully learnable embeddings. This ensures that the application runs on smaller data sets while maintaining performance. Furthermore, to better catch the interpeak dependencies, we introduced a deep classifier tail that uses all of the features in the last transformer layer. This ensures that the relationships between different characteristic peaks are well learned and gives a better phase (combinations of characteristic peaks) identification result. Comprehensive experiments on two inorganic data sets demonstrate that XViT outperforms both CNN and ViT models in XRD phase identification.

Bookmark

Cite This Study

Wei et al. (Fri,) studied this question.

synapsesocial.com/papers/69edad4b4a46254e215b4e7e https://doi.org/https://doi.org/10.1021/acs.jcim.6c00180

Bookmark