Abstract Animal venomics is a growing field of research with evolutionary and biotechnological significance. Yet, fundamental questions regarding the origin, diversification, and bioactivity of venoms remain unresolved. Here, we analysed venom tissue-related data curated in Tox-Prot, currently the most comprehensive database for animal venoms, across three snapshots spanning two decades (2005, 2015, 2025). We assessed the taxonomic landscape related to Tox-Prot entries, sequence length distribution, protein family abundances, and habitat-specific venom patterns. Our results consistently show that snakes, spiders, cone snails, and scorpions, along with their associated protein families such as Phospholipase A2, Snake Three Finger Toxin and Long 4(C-C) scorpion toxin family, dominate across Tox-Prot. Nevertheless, the taxonomic and protein family diversity has been steadily increasing, with 503 new species and 188 new protein families added by 2025 compared to the 2005 dataset. Marine species account for 16%–20% of total species, of which 63%–85% are Neogastropoda reflecting limited marine species diversity coverage; likewise, terrestrial taxa are disproportionately represented by Squamata (39%) and Hymenoptera (20%) relative to their natural diversities of 2.3% and 50.26%, respectively. At the molecular level, half of all entries correspond to mature peptide sequences of 26–75 amino acids, featuring three to four disulfide bridges and C-terminal amidations as the most frequently recorded post-translational modifications. Further, protein language model embeddings infer taxonomical diverse peptide and delimited large enzyme clusters. Our study maps two decades of venom data diversification, reflecting both the field’s rapid expansion and the need for robust integrated datasets to propel and disseminate that knowledge.
Kirchhoff et al. (Thu,) studied this question.