This paper presents a comprehensive overview of the evolution of data science from a statistics-centric discipline to a machine learning–driven field, culminating in the current integration of large language models (LLMs). It identifies key limitations in traditional LLM applications—such as limited cross-domain adaptability, lack of interpretability, and workflow rigidity—and explores recent innovations addressing these challenges. Three representative frameworks—R&D-Agent, SPIO, and Agent Laboratory—illustrate LLMs’ transition from assistive tools to autonomous agents capable of planning, executing, and optimizing entire data science workflows. These systems leverage dual-agent cooperation, modular architectures, and self-correcting capabilities to improve performance in end-to-end data analysis and scientific research. The paper concludes by outlining future priorities, including domain-specific customization, standardized agent evaluation, and improved interpretability, all of which are essential for the next generation of intelligent, autonomous data science systems.
Building similarity graph...
Analyzing shared references across papers
Loading...
Xinyou Yin
Advances in Engineering Technology Research
Building similarity graph...
Analyzing shared references across papers
Loading...
Xinyou Yin (Sat,) studied this question.
www.synapsesocial.com/papers/68c1a76954b1d3bfb60e03b7 — DOI: https://doi.org/10.56028/aetr.14.1.1582.2025