Gene co-expression networks (GCNs) can reveal useful gene co-functional and co-regulatory relationships. However, current GCN construction methodologies are sensitive to batch effects and sample composition, limiting their performance in generating GCNs from public RNA-seq samples abundant for many species. Here, we report the development of TEA-GCN (two-tier ensemble aggregation-GCN; https://github.com/pengkenlim/TEA-GCN), a GCN construction method that leverages unsupervised transcriptomic dataset partitioning and multi-metric co-expression scoring to derive ensemble gene co-expression. Benchmarking over 450,000 public RNA-seq samples across 12 species, TEA-GCN outperforms the state-of-the-art in predicting gene functions and inferring gene regulatory networks. Through the use of natural language processing, we also show that the biologically-relevant dataset partitions with high co-expression can identify tissue-/condition-specific co-expression in TEA-GCN, providing high level of explainability. Furthermore, we show that TEA-GCNs exhibit enhanced conservation across species, making them suitable for multi-species comparative studies.
Lim et al. (Thu,) studied this question.