What question did this study set out to answer?

This research aims to analyze the reproducibility of RNA-ligand binding site predictions using dual-language models.

March 30, 2026

A Dual-Language-Model Framework for Reproducibility in Small Molecule-RNA Binding Site Prediction

Key Points

This research aims to analyze the reproducibility of RNA-ligand binding site predictions using dual-language models.
Integrated two large pretrained RNA language models (RNA-FM and RiNALMo).
Conducted multiple training runs on the TR60/TE18 benchmark.
Examined various fusion architectures including reverse cross-attention and concat fusion.
Identified a 32.8% overestimation in performance from single-seed evaluations.
Peak performance in reverse cross-attention model reached an MCC of 0.353, surpassing the state-of-the-art.
Found that simple concat fusion strategies provided higher stability compared to attention-based models.

Abstract

Single-seed evaluation-the dominant reporting practice in small-dataset molecular learning-can substantially inflate performance estimates yet remains largely unexamined. We present the first systematic reproducibility analysis for RNA-ligand binding site prediction by integrating two large pretrained RNA language models (RNA-FM and RiNALMo) across multiple fusion architectures and replicated training runs on the TR60/TE18 benchmark. Our analysis reveals a pronounced Peak-SOTA Paradox: a favorable initialization in the Reverse Cross-Attention model reached an MCC of 0. 353, surpassing the reported state-of-the-art (0. 327), whereas multi-seed replication yielded only 0. 266 0. 020-a 32. 8% overestimation. Across architectures, mean accuracy remained tightly clustered, yet reproducibility varied substantially. Simple concat fusion strategies exhibited markedly higher stability than attention-based models, indicating that architectural entanglement rather than parameter count governs variance under data scarcity. Collectively, these findings establish reproducibility as a primary evaluation criterion for small-sample molecular prediction and motivate a dual-reporting standard in which mean SD serves as the principal metric and peak scores as supplementary evidence. This variance-aware perspective highlights that single-seed evaluations can misrepresent expected performance by 20-30% in limited-sample regimes.

Bookmark

A Dual-Language-Model Framework for Reproducibility in Small Molecule-RNA Binding Site Prediction

Key Points

Abstract

Cite This Study