ABSTRACT Aptamers are promising molecular recognition elements with broad applications in diagnostics, therapeutics, and biosensing; however, their discovery remains labor‐intensive and time‐consuming due to limitations in traditional SELEX‐based workflows. In this study, we propose a transformer‐based sequence‐to‐sequence framework that directly generates aptamer sequences conditioned on target protein sequences, enabling a generative approach to protein–aptamer design. The model incorporates k‐mer tokenization (3‐mer for proteins and 6‐mer for aptamers) and leverages self‐supervised pretraining on large‐scale protein and RNA datasets to learn sequence and structural representations. We first evaluated the model on aptamer–protein interaction (API) prediction, where it achieved performance comparable to AptaTrans (ACC = 0.902, AUC = 0.918), while maintaining a simpler architecture and improved suitability for generative tasks. To further assess its design capability, we conducted a two‐stage in silico validation using 100 randomly selected proteins from the PDB dataset. In cross‐model evaluation, the proposed model significantly outperformed AptaTrans (mean binding score: 0.898 vs. 0.836, p < 2.22 × 10 −16 ), indicating improved generalization and reduced self‐consistency bias. In structure‐based validation using HDOCK, both models showed comparable docking performance under a unified scoring framework, suggesting that the generated sequences maintain structural binding feasibility. Finally, experimental validation using an ALISA assay demonstrated that the predicted aptamer exhibits concentration‐dependent binding to CCL4, with a strong linear correlation ( R 2 = 0.95), confirming its target‐specific binding capability.
Hsu et al. (Tue,) studied this question.