Approximately 98% of the human genome does not code for proteins. Conserved non-coding elements (CNEs) that are present in humans but absent in other primates are particularly understudied. Using public data (phastCons100way, GENCODE v46, liftOver, RepeatMasker), we identified CNEs ≥100 bp in length, non-coding, absent in chimpanzee, gorilla, orangutan, and macaque (zero overlap criterion with -minMatch=0.1), and not simple repeats. Distribution analysis was performed using bedtools closest and a permutation test (n=10,000). We identified 1,415 human-specific non-coding elements. Of these, 550 are located on sex chromosomes. 113 of these 550 (20.5%) are located in pseudoautosomal region 1 (PAR1) of chromosomes X and Y (coordinates 10,001–2,781,479 bp). For a random distribution, 17 (3.1%) are expected (p < 0.001). The odds ratio is 8.2 (95% CI: 4.8–14.0). Human-specific non-coding elements are significantly enriched in PAR regions of sex chromosomes. These regions may represent hotspots for recent evolutionary insertions in the human genome.
Ilya Merezhko (Thu,) studied this question.