Nucleic-acid aptamers can bind target proteins with high affinity and specificity; however, experimental screening procedures such as SELEX require iterative operations and remain costly and time-consuming, and are susceptible to experimental biases. Although recent RNA/aptamer generation studies have explored de novo modeling and structure-aware optimization, practical gaps remain for target-specific design: generation is often not directly conditioned on the target protein and may rely on structural information or known motifs, and many learning-based approaches optimize the full RNA sequence uniformly even though binding is governed by localized protein-RNA contacts. In this study, we propose AptaRL, a protein-conditional RNA generation model that takes only a target protein sequence as input and generates candidate RNA sequences. We pretrain a protein-conditioned autoregressive Transformer decoder on protein-RNA pairs using residue embeddings from a pretrained protein language model, and then introduce reinforcement learning (REINFORCE) to directly optimize an external binding predictor (LucaOne) while suppressing off-target binding and enforcing aptamer-like properties (GC content, length, and RNAfold minimum free energy). DeepCLIP scoring indicates that reinforcement learning increases the proportion of high-scoring sequences; auxiliary rewards mitigate mode collapse and improve diversity, supporting in silico pre-screening prior to experiments.
Ishii et al. (Thu,) studied this question.