In order to enable a major boost in the availability of human molecular data for genetic ancestry-informed research, we have recently developed an R language package RAIDS (Robust Ancestry Inference using Data Synthesis). Initial Bioconductor release of RAIDS in the fall of 2023 provided functionalities for inference of global genetic ancestry, on a continental level of resolution, from sequence data whose use for this purpose is challenging. These included data originating in RNA-seq, small panels targeting a few hundred genes, and whole-genome or whole-exome sequences of DNA with severe somatic alterations caused by cancer. A key distinguishing feature of RAIDS is its ability, specifically for given sequence data from a given human nucleic acid donor, to tune the inference parameters for optimal performance and to assess the accuracy of inference at the optimum. Underlying this ability is a novel technique for synthesis of read data with any given coverage and read quality, with known, "ground-truth", genotypes throughout the donor's genome. As a result of subsequent development, RAIDS has undergone a major upgrade and now supports inference of genetic ancestry from read data originating in ATAC-seq, single-cell RNA-seq, ChIP-seq, and bisulfite conversion assays. Furthermore, RAIDS now enables inference of ancestral admixtures, for a more in-depth characterization of the donor's ancestral background. The expanded and refined RAIDS providing these functionalities will be submitted to Bioconductor for its next release. In addition, RAIDS can now be invoked on the Galaxy platform and will, in the near future, be available from the Galaxy Tool Shed.
Belleau et al. (Tue,) studied this question.