What question did this study set out to answer?

The study aims to assess the performance of KANBind in predicting DNA-binding proteins under conditions of homology and class imbalance.

April 23, 2026

KANBind as a diagnostic probe for DNA-binding protein prediction: A prevalence-calibrated reality check under strict homology control

Key Points

The study aims to assess the performance of KANBind in predicting DNA-binding proteins under conditions of homology and class imbalance.
Utilized homology-controlled HBTD benchmark for testing KANBind.
Employed prevalence-calibrated reporting to reflect realistic scenarios.
Conducted interpretability analysis to identify prediction drivers.
KANBind achieved a calibrated precision of 0.0558, indicating a high false discovery rate of 94.42%.
Approximately 95 false positives were predicted per 100 DNA-binding proteins in a proteome-scale scan.
Predictions were largely influenced by physicochemical cues, especially electrostatics, rather than functional rules.

Abstract

Deep learning reports over 90% DNA-binding protein (DBP) prediction performance on common benchmarks, but these results are usually obtained on balanced test sets and may not translate to proteome-wide scans with extreme class imbalance. Here, we use KANBind as a diagnostic probe to stress-test sequence-based DBP prediction under strict homology control and realistic prevalence. Evaluated on the homology-controlled HBTD benchmark with prevalence-calibrated reporting, KANBind achieves a calibrated precision of 0.0558 at a realistic bacterial prevalence (Formula: see text), implying an expected false discovery rate (FDR) of 94.42%. In a proteome-scale scan, this corresponds to approximately 95 false positives per 100 predicted DBPs. Interpretability analysis indicates that predictions are driven mainly by coarse physicochemical cues such as electrostatics, which may be necessary for DNA binding but are insufficient to determine DBP function. Together, these results suggest that apparent benchmark gains can be dominated by homology leakage and evaluation on balanced sets rather than by generalizable functional rules, motivating stress-test benchmarks with strict homology control and realistic negative backgrounds.

Bookmark

KANBind as a diagnostic probe for DNA-binding protein prediction: A prevalence-calibrated reality check under strict homology control

Key Points

Abstract

Cite This Study