Abstract Background: While synoptic reports have standardized breast cancer data, a wealth of critical information remains locked in unstructured free-text, including nuanced biomarker details and procedural notes. This data is not easily computable, creating a significant barrier to understanding comprehensive population-level statistics and subtle clinical trends. Traditional data extraction methods are often insufficient to capture this complexity. We developed and validated a novel, flexible AI-powered system using an agent-based architecture to extract, normalize, and discretize this valuable unstructured data, enabling previously infeasible population-scale analytics. Methods: Our system employs a modular, agent-based framework where each "agent" is a specialized large language model (LLM) tasked with a specific function in a sequential workflow. The process begins with an initiator agent that parses the synoptic report. Subsequent agents then perform targeted tasks, such as identifying and normalizing tumor characteristics, procedural details, and biomarker information from free-text fields. A final agent aggregates this information into a structured, computable format. To rigorously test our approach, we first developed a synthetic data generator to create realistic, anonymized report snippets with known ground-truth values. We then evaluated the performance of several leading LLMs of various sizes (e.g., gemini, llama3.3, phi4, deepseek-r1, gemma3) on this synthetic dataset. Results: Our early analysis on simulated synoptic reports demonstrated high accuracy in data extraction and normalization. The gemini-1.5-pro-002 model achieved an overall accuracy of 96.9%, with a sensitivity of 99.3%, specificity of 94.3%, and an F1-score of 97.0%. The llama3.3:70b model also showed strong performance with a 98.7% accuracy, (F1: 98.8%, sensitivity: 99.8%, specifically: 97.6%). Gemma3:27b was the least performant model tested (F1: 66.2%), substantially worse than much smaller models (deepseek-r1:1.5b and phi4:14b). A detailed error analysis of false positives and negatives has allowed us to iteratively refine the agent logic. By the time of the symposium, we will have analyzed a large cohort of real, de-identified synoptic reports and will present comprehensive performance metrics and initial findings on population-level trends. Conclusion: This AI-powered, agent-based system offers a powerful and scalable solution to overcome the limitations of structured data in synoptic reports. By accurately extracting and normalizing previously inaccessible free-text information, our approach has the potential to provide clinicians and researchers with near real-time insights into population-level statistics and trends in breast cancer. This will facilitate a more profound understanding of disease patterns and ultimately support evidence-based advancements in patient care and clinical research. Citation Format: S. N. Hart. A Novel Agent-Based AI System to Unlock Unstructured Data in Synoptic Reports for Advanced Population Health Analysis in Breast Cancer abstract. In: Proceedings of the San Antonio Breast Cancer Symposium 2025; 2025 Dec 9-12; San Antonio, TX. Philadelphia (PA): AACR; Clin Cancer Res 2026;32(4 Suppl):Abstract nr PS3-06-11.
S. N. Hart (Tue,) studied this question.