Background: Human skin and saliva microbial communities have emerged as promising forensic biomarkers due to their individual specificity. However, existing studies are limited by small sample sizes and methodological inconsistencies. This proof-of-concept study aims to develop a novel framework integrating 2bRAD-M sequencing with a hierarchical attention network (HAN) for forensic individual identification, addressing these limitations through large-scale public data integration and controlled validation. Methods: We utilized 2263 skin and saliva samples from public databases (Qiita, HMP, NCBI SRA) for model development. These public data included longitudinal samples collected over periods up to 180 days. A contemporary validation cohort of 6 volunteers, providing 26 forensic-relevant samples (including simulated touch evidence), was sequenced using 2bRAD-M for validation. Data integration involved batch effect correction (ComBat), normalization (CSS), and cross-database harmonization using GTDB for taxonomic assignment. The HAN model was optimized with triplet margin loss for metric learning. Results: The HAN model achieved 98.7% Rank-1 accuracy for pristine samples, outperforming random forest (70.2%) and CNN (75.8%). Microbial signatures showed high temporal stability (ICC = 0.86 over 180 days) and robustness in mixed samples (87.4% accuracy). Discriminatory biomarkers included Cutibacterium (skin) and Prevotella (saliva). Particulate matter exposure significantly influenced microbial composition (PERMANOVA R2 = 0.32, p < 0.001). Conclusions: This study establishes a proof-of-concept pipeline for microbial forensics, demonstrating high accuracy under controlled conditions. Future work must address antibiotic exposure, sample diversity, and cross-laboratory validation before forensic implementation.
Li et al. (Thu,) studied this question.