What question did this study set out to answer?

April 25, 2026Open Access

Advances and trends in the bidirectional transformation between biological data and knowledge

Key Points

This work aims to systematically analyze the bidirectional transformation between biological data and knowledge driven by AI technologies.
Examined the integration of AI in biological data processing and modeling.
Outlined the Data-to-Knowledge (D2K) and Knowledge-to-Data (K2D) trajectories.
Discussed the Learn-Design-Build-Test (LDBT) cycle as an operational framework.
Established that AI-ready data enhances the generation of testable biological insights.
Demonstrated the importance of model interpretability and integration in the LDBT cycle.
Identified ongoing challenges, including data reliability and the need for standardization between D2K and K2D stages.

Abstract

The life sciences research paradigm is undergoing a profound transformation from unidirectional data analysis towards a synergistic, closed-loop system of ″data-model-knowledge-data″. This evolution is centrally driven by the pervasive integration of artificial intelligence technologies, which are redefining biological data from static repositories into programmable, designable intelligent entities. This paper systematically examines the bidirectional transformation between biological data and knowledge, highlighting the critical roles of AI-ready data, intelligent models, and the ″Learn-Design-Build-Test″ (LDBT) cycle. In the Data-to-Knowledge (D2K) trajectory, the journey begins with ensuring data ″AI-ready″, adhering to FAIR principles, possessing standardized formats, and being semantically aligned with biological knowledge. High-quality, structured data from major databases like PDB, NCBI, and GEO fuel sophisticated models. These models learn patterns to generate statistical or correlative knowledge. The crucial next step, Model-to-Knowledge (M2K), involves translating model outputs into verifiable scientific knowledge, such as mechanistic hypotheses. Enhanced model interpretability and integration into the LDBT cycle are essential for this transformation, moving beyond mere correlations to testable biological insights. Conversely, the Knowledge-to-Data (K2D) trajectory initiates with Knowledge-to-Model (K2M), where established mechanistic, associative, or hypothetical knowledge is encoded into computational model architectures. This is exemplified by digital twins and virtual cell models, which embed biological priors as structural constraints. Subsequently, in Model-to-Data (M2D), these knowledge-informed models including generative AI like diffusion models, cross-omics translators, and single-cell foundation models actively synthesize biologically plausible predictive or synthetic data. This addresses data scarcity and guides experimental design. The LDBT paradigm forms the core operational engine that unifies these bidirectional paths, creating a spiraling iterative relationship. Data drives model learning, models distill knowledge, and knowledge feeds back to generate new data for training superior models. However, challenges remain, including ensuring the reliability and reusability of AI-extracted knowledge, bridging the ″conversion gap″ between computational designs and successful experimental validation, and establishing standardized interfaces between D2K and K2D stages. Looking forward, the bidirectional loop is posited as a fundamental methodological framework for tackling biological complexity and integrating multimodal data. Its systematic engineering, through the continuous optimization of the LDBT cycle within research infrastructure, paves the way for life sciences to advance into an era of predictive and designable intelligence. Future efforts must focus on building a robust AI-ready data foundation, developing next-generation algorithms that deeply integrate data and prior knowledge, and perfecting the dry-wet lab integration for automated scientific discovery.

Bookmark

View Full Paper

Bookmark

View Full Paper

Advances and trends in the bidirectional transformation between biological data and knowledge

Key Points

Abstract

Cite This Study