Abstract Motivation Verification of sample sex is an essential quality control step in next-generation sequencing studies, typically assessed from genomic data. Clustering individuals by X chromosome heterozygosity (Xhet) and incorporating relatedness estimates offers a practical first-pass screen for potential sex label errors, sample mix-ups, and pedigree inconsistencies. To better interpret Xhet based patterns, we further investigated the biological and technical origins using the 1000 Genomes Project dataset. Results We developed XhetRel, a user-friendly workflow and notebook application that computes Xhet and performs relatedness estimation directly from VCF files. As a fully genotype-based approach, XhetRel enables both sex-based clustering and relatedness assessment as an initial quality control (QC) step in NGS. XhetRel serves groups without bioinformatics infrastructure, users requiring a browser-based QC tool, and workflow developers seeking a modular Nextflow component. Our investigation into the sources of Xhet variation highlighted important limitations in sequencing and variant-calling approaches. In particular, specific pseudogenes and gene clusters, such as SLC25A5 and the GAGE cluster, as recurrent contributors to misleading variant allele fractions. Availability and implementation The source code and data are available at Figshare (doi : 10.6084/m9.figshare.28280414). XhetRel can be executed locally via Nextflow or accessed directly through the online Collab notebook at https://colab.research.google.com/drive/1ep69JvXLwK5ndHUQ8qIGTWvauzsTW9fi.
Salman et al. (Tue,) studied this question.