Abstract Objective We propose KnvResGAT for efficient SARS-CoV-2 lineage classification by combining k-mer Natural Vector (KNV) representations with a residual multi-head Graph Attention Network (GAT) on a k-nearest-neighbor (kNN) similarity graph constructed in the KNV feature space. Results On a time-aware per-lineage split of 182,851 curated SARS-CoV-2 genomes spanning 103 Pango lineages, KnvResGAT achieved 0.9729 accuracy and 0.9636 Macro-F1. Under the same split, it outperformed Pangolin (0.9673 accuracy, 0.9471 Macro-F1) and a strong deep baseline ResMLP (0.9654 accuracy, 0.9520 Macro-F1), demonstrating improved generalization for multi-class lineage classification.
Yu et al. (Thu,) studied this question.