Accurate prediction of bacterial virulence factors (VFs) is crucial for combating infectious diseases, yet traditional methods often fail to capture their complex sequence properties. We address this challenge by leveraging deep, context-aware representations from large-scale protein language models (PLMs). Our framework begins with a systematic engineering of features from ESM-2 and ProtT5, which confirmed their complementary nature but also revealed that simple concatenation is a suboptimal fusion strategy due to a "feature overshadowing" effect. To overcome this, we developed two novel architectures: VF-Iter, for robust feature enhancement via iterative low-rank updates, and the Dual-Path Feature Fusion (DPF) network, for intelligently integrating the complementary embeddings. The construction of our final model, VF-Fuse, involved a two-stage process. First, we selected four powerful and diverse base models representing our distinct feature strategies (ESM-2 only, ProtT5 only, simple concatenation, and DPF). Second, we empirically determined the best method for combining their predictions by benchmarking 15 ensemble techniques, from which Majority Voting emerged as the superior choice. On the independent test set, VF-Fuse establishes a new state of the art, achieving a superior F1-Score of 87.15% and a Matthews Correlation Coefficient of 73.61%. This F1-Score marks a significant 3.3% improvement over the previous best method, driven by an excellent balance between a high Sensitivity of 90.1% and a strong Specificity of 83.33%. Crucially, in-depth interpretability analyses validated our architectural design, demonstrating how the DPF model learns to intelligently route complementary features to specialized pathways.
Building similarity graph...
Analyzing shared references across papers
Loading...
Lucheng Huang
Xiangyu Yu
Shumei Li
Anhui Agricultural University
Building similarity graph...
Analyzing shared references across papers
Loading...
Huang et al. (Sun,) studied this question.
www.synapsesocial.com/papers/68d4759031b076d99fa6d45c — DOI: https://doi.org/10.1093/bib/bbaf481