Most ancestry inference methods rely on putatively pure reference panels to define ancestry informative variants. This approach is often unrealistic and can bias inference. The genome polarization algorithm diem, introduced previously by baird et al. (2023), avoids reference panels by jointly inferring the polarity of common allelic states and quantifying variant diagnosticity via an expectation-maximisation procedure. Importantly, we use ``polarization'' strictly to mean the assignment of alleles to opposing sides of a barrier to gene flow, rather than the assignment of ancestral versus derived states. Here we present diempy, an efficient python implementation of diem coupled with tools that turn polarised calls into analysis-ready outputs. diempy offers lossless VCF-to-diem BED conversion; ploidy-aware handling of individuals and chromosomes; flexible masking of sites, regions and individuals; and interactive visualisation of polarised genomes, hybrid indices, clines and ternary plots. Post-processing functions include thresholding via the diagnostic index (DI), kernel smoothing, and automatic detection and run-length encoding of contiguous ancestry tracts. BED-based I/O facilitates integration with population-genomic workflows (e.g. filtering by annotation or ploidy). These features make reference-panel-free genome polarisation with diempy practical and reproducible for studies of population structure, admixture and species barriers.
Setter et al. (Mon,) studied this question.