What does this research mean for the field?

Centralized implementation of the AI-based FLEX score allows a small team to efficiently screen thousands of scheduled surgeries and identify the highest-risk patients for preoperative optimization, overcoming the adoption barriers of distributed AI use. Novelty: ClaimNovelty.METHODOLOGICAL. Consensus alignment: ConsensusAlignment.CHALLENGES_CONSENSUS.

What question did this study set out to answer?

The aim is to explore how centralized AI-enabled risk assessment can enhance surgical outcomes by simplifying integration into clinical workflows.

March 28, 2026Open Access

Rethinking the Integration of Artificial Intelligence Into Surgery: Centralized Risk Assessment for System-Level Impact

Resultado clave

Centralized implementation of the AI-based FLEX score allowed a two-person team to screen over 2,200 scheduled surgeries, filtering over 100 weekly cases to the 10 highest-risk patients.

Puntos clave

The aim is to explore how centralized AI-enabled risk assessment can enhance surgical outcomes by simplifying integration into clinical workflows.
Developed and validated the FLEX Score for preoperative risk prediction using EHR data.
Implemented a pilot program with a centralized approach involving a trained surgical team.
Engaged a single surgeon to lead decision-making based on FLEX Score outputs with support from a research assistant.
Centralized workflows successfully identified high-risk patients without delaying surgical schedules.
Over 2,200 GI surgeries screened, with focused reviews reducing the burden on surgeons.
Rapid buy-in and implementation across 30 surgeons following initial pilot results.

PICO estructurado

Población

Patients scheduled for gastrointestinal (GI) surgery

Intervención

Centralized AI-enabled preoperative risk assessment using the FLEX score

Centralizing AI-enabled preoperative risk assessment to a small, trained team is a feasible and scalable approach to integrate AI into routine surgical workflows without burdening individual clinicians.

Resultado numérico

Tasa de eventos absoluta: 0% vs 0%

Resumen

INTRODUCTION Accelerated innovation in artificial intelligence (AI) and the increasing availability of electronic health record (EHR) data have enabled increasingly accurate models of postoperative risk prediction.1–3 However, the performance and availability of these predictions alone have not translated into meaningful, system-level impact on surgical care. Traditional tools like the American College of Surgeons National Surgical Quality Improvement Program have exemplified this gap: despite being easily accessible, their use is limited by the need for manual data entry, modest accuracy in some subgroups, and weak integration with day-to-day surgical workflows.4 Many AI-based risk models now outperform these legacy scores, but in practice, they often join a growing collection of underused calculators and dashboards.5 Newer AI tools highlight the same pattern. For example, AI-based medical scribes promise to reduce documentation burden, but real-world uptake has been uneven because each clinician must both learn to operate the tool and understand its shortcomings, such as omissions or distortions of clinical content, to incorporate into their workflows.6 When safe and effective AI use depends on distributed, individual-level expertise, adoption tends to be variable, fragile, and difficult to sustain. Postoperative risk prediction models built on high-dimensional EHR data face analogous obstacles. Even when they outperform existing scores, translation into routine perioperative care is slowed down by a lack of clinician familiarity with AI tools, poor interpretability of model outputs, and uncertainty about how models should drive clinical decisions.7,8 Again, putting the responsibility of implementation into the hands of individual surgeons and anesthesiologists leads to fractured adoption. We argue that successful integration of AI into surgery does not require every clinician to become an AI expert. Instead, one practical and scalable approach is to centralize AI-enabled risk assessment to a small number of clinically grounded decision-makers who (1) understand the AI-model and its limitations, (2) can translate prediction into concrete perioperative actions, and (3) have the trust of their surgical colleagues. We illustrate this approach through our experience implementing the flexible surgical set embedding (FLEX) score, an AI-based preoperative risk prediction model, into the gastrointestinal (GI) surgery service at Massachusetts General Hospital. TRANSLATING THE FLEX SCORE FROM A THEORETICAL ARTIFICIAL INTELLIGENCE MODEL TO DEPLOYABLE TOOL Our team had previously developed and validated the FLEX Score, an AI-based risk prediction tool that uses routinely collected preoperative data to predict individualized postoperative risks, including postoperative mortality, 30-day hospital readmission, hospital length of stay, and discharge to non-home care.1 The FLEX Score was internally validated on retrospective data from Massachusetts General Hospital, and demonstrated superior performance compared with the clinically validated Hospital Frailty Risk Score.9 Because this model only requires preoperative data routinely available weeks before surgery, we can run this model automatically for all scheduled surgical patients without manual data entry and generate predictions far enough in advance to enable meaningful preoperative optimization. RETHINKING ARTIFICIAL INTELLIGENCE ADOPTION IN SURGERY: FROM DISTRIBUTED USERS TO CENTRALIZED RISK ASSESSMENT Most AI implementation strategies assume that each surgeon or anesthesiologist will directly interact with the model: logging into dashboards, interpreting outputs, and incorporating them into their own clinical decisions. However, this assumption conflates access with literacy and places the burden of learning, evaluation, and safe use on every individual clinician. In a high-volume, time-pressured perioperative environment, this expectation is often unrealistic. Our strategy took the opposite approach: rather than distributing responsibility for AI-adoption across all clinicians, we centralized the use of the FLEX Score for preoperative risk assessment to a small, trained team. A single surgeon acted as the primary clinical decision-maker for FLEX Score outputs across the department, supported by a research assistant who prepared case summaries for high-risk patients flagged by the FLEX Score. This surgeon had credibility within the department and enough familiarity with the FLEX Score to understand both its strengths and limitations. Other surgeons were not asked to interact with the models themselves; instead, they engaged with a clinical process led by a trusted peer. We began the integration of the FLEX Score with the 4 surgeons within Colorectal surgery to pilot the workflow on a manageable scale before expanding more broadly. We ran the FLEX Score for all colorectal procedures scheduled over the upcoming 30 days to generate patient-specific predictions for adverse postoperative outcomes. A unique aspect of the FLEX Score is that it couples these overall predictions with a ranked list of the patient’s diagnoses and past surgical history that contributed significantly to each patient’s elevated risk. Equipped with these risk scores, the research assistant then reviews the patient list from highest to lowest predicted risk, focusing on those above a predefined threshold created in collaboration with the colorectal surgeons. For each patient, the assistant performs a focused chart review concentrating on diagnoses highlighted by the FLEX Score and elements not fully captured in the model, such as nuanced descriptions of functional status or frailty in clinical notes or socioeconomic barriers that might affect perioperative management. The assistant then prepares a concise summary of each patient, incorporating the FLEX Score predictions and manual chart review for the surgeon to review. Because the reviewer is a practicing surgeon, they can interpret FLEX Score predictions within the context of procedure-specific risks, patient goals, and local resource constraints. For each high-risk patient, the surgeon decides whether to refer the patient to our hospital’s dedicated Perioperative Optimization of Senior Health clinic for further evaluation and optimization before surgery. This clinic provides frail patients with a comprehensive geriatric assessment and interdisciplinary case review with geriatrics, surgery, anesthesiology, physical therapy, and nutrition. A recent study of colorectal patients treated at the Perioperative Optimization of Senior Health clinic found the most common interventions were advanced care planning, physical therapy, medication management, and dietary counseling.10 SCALING FROM PILOT TO DEPARTMENT-WIDE IMPLEMENTATION Following a 2-month pilot with Colorectal surgery, we presented our initial experience implementing the FLEX Score at several surgery department meetings, highlighting that our workflows did not introduce any scheduling delays, did not require any additional work among surgeons, and referred our highest-risk patients for further optimization before surgery. Given that the process was led by a colleague whose clinical judgement they already trusted, buy-in was rapid, and within several days, our centralized risk assessment workflows using the FLEX Score expanded to the entire GI surgery department, made up of over 30 surgeons. Over a 4 month period from May 1 to August 30, 2025, we used this centralized approach to screen more than 2,200 scheduled GI surgeries. Notably, we were able to do this drastic expansion with the same single surgeon and research assistant review team, highlighting the potential for centralization to support system-level impact without a proportional increase in personnel time. Each week, more than 100 patients were scheduled for GI surgery, but the FLEX Score reliably filtered this list to roughly 10 highest-risk patients, reducing the review burden from a full case list to a focused subset for potential referral to preoperative optimization. We are currently working to expand this workflow to other departments such as Urology and Orthopedic surgery. LESSONS LEARNED AND EMERGING PRINCIPLES Our experience suggests that centralized AI-enabled preoperative risk assessment in a small, trusted team can make adoption feasible in surgical environments. Automated stratification with the FLEX Score allowed a single surgeon supported by a research assistant to identify the highest-risk patients across all scheduled surgeries within a whole department and connect them with resources for preoperative optimization. Most surgeons did not need to learn how to use the model or interpret outcomes, they needed only to understand the workflow and trust in a colleague with the same surgical expertise. Despite the insights from our work, the successful and scalable incorporation of future AI-based tools into clinical workflows will require substantial buy-in from EHR companies. Although major vendors provide some mechanisms to access clinical data and deploy new AI-tools, the data available is often restricted, and the actual steps to translate developed tools into elements with the EHR system are not available to standard users. In our setting, these constraints meant that implementation of the FLEX Score must operate outside the EHR. As Medicare’s Transforming Episode Accountability Model (TEAM) and similar value-based payment reforms increasingly tie reimbursement to identification and optimization of at-risk older surgical patients, methods for integrating AI-tools to flag these patients will be critical not only for clinical care, but also the financial sustainability of health systems. Building on this experience with GI surgery, we are now piloting centralized AI use in other perioperative contexts. In Vascular surgery, FLEX Score predictions are being explored as a tool to integrate patient-level information into the decision between open and endovascular surgical approaches. In the preoperative anesthesia clinic, we are testing the FLEX Score’s unique ability to identify significant past medical and surgical factors that contribute to postoperative risk predictions to build surgery-specific problem lists, allowing more time spent assessing patients rather than chart review. Together, these efforts highlight how thoughtfully designed, centralized workflows may be a practical path for moving AI from proof-of-concept models to routine surgical practice. ACKNOWLEDGMENTS We want to thank all members of the Kunitake and Purdon research groups who helped develop the FLEX Score and worked with us to integrate it into surgical workflows. We also thank the leaders of the surgical departments who partnered with us to integrate AI into clinical workflows, as well as the surgeons and anesthesiologists whose engagement made large-scale adoption possible across our medical centers.

Leer artículo completoexternamente

Me gusta

Guardar

Ver artículo completo