What question did this study set out to answer?

This research aims to enhance RNA sequence generation for binding to specific proteins using reinforcement learning techniques.

May 31, 2026Open Access

Reinforcement Learning Based Design of Protein-binding RNA Sequences

Key Points

This research aims to enhance RNA sequence generation for binding to specific proteins using reinforcement learning techniques.
Developed AptaRL, a model generating RNA sequences conditioned on specific protein sequences.
Pretrained an autoregressive transformer on protein-RNA pairs with embeddings from a protein language model.
Used reinforcement learning to optimize an external binding predictor while preserving aptamer properties.
Reinforcement learning significantly increases the percentage of high-scoring RNA sequences, showing improved binding potential.
Auxiliary rewards effectively reduce mode collapse and enhance sequence diversity, crucial for efficient experimental pre-screening.

Abstract

Nucleic-acid aptamers can bind target proteins with high affinity and specificity; however, experimental screening procedures such as SELEX require iterative operations and remain costly and time-consuming, and are susceptible to experimental biases. Although recent RNA/aptamer generation studies have explored de novo modeling and structure-aware optimization, practical gaps remain for target-specific design: generation is often not directly conditioned on the target protein and may rely on structural information or known motifs, and many learning-based approaches optimize the full RNA sequence uniformly even though binding is governed by localized protein-RNA contacts. In this study, we propose AptaRL, a protein-conditional RNA generation model that takes only a target protein sequence as input and generates candidate RNA sequences. We pretrain a protein-conditioned autoregressive Transformer decoder on protein-RNA pairs using residue embeddings from a pretrained protein language model, and then introduce reinforcement learning (REINFORCE) to directly optimize an external binding predictor (LucaOne) while suppressing off-target binding and enforcing aptamer-like properties (GC content, length, and RNAfold minimum free energy). DeepCLIP scoring indicates that reinforcement learning increases the proportion of high-scoring sequences; auxiliary rewards mitigate mode collapse and improve diversity, supporting in silico pre-screening prior to experiments.

Bookmark

View Full Paper

Bookmark

View Full Paper

Reinforcement Learning Based Design of Protein-binding RNA Sequences

Key Points

Abstract

Cite This Study