Atomic-scale defects govern many functional properties of materials, yet their systematic identification and quantification remain challenging because supervised learning approaches require extensive labeled datasets, which are scarce in atomic-resolution microscopy due to the complexity and diversity of defect structures. To overcome this limitation, we introduce a fully unsupervised machine learning framework capable of discovering and clustering defect structures without prior labeling or predefined defect classes. The framework employs a convolutional variational autoencoder (CVAE) to reconstruct ideal, defect-free images, enabling the generation of difference images that isolate local structural anomalies. From these, 47 features are extracted and refined through a three-tier feature selection process to minimize redundancy and noise. Dimensionality reduction via principal component analysis (PCA), combined with silhouette score optimization, guides the determination of the optimal cluster number prior to applying k-means clustering, which yields well-separated groups corresponding to distinct defect types. Validated on CdTe and SrTiO3 datasets, this unsupervised, label-free approach enables high-throughput defect discovery and clustering in scanning transmission electron microscopy (STEM) and related imaging modalities.
Ayyubi et al. (Tue,) studied this question.