What is the clinical evidence from this study?

Study design: Observational. Population: Cancer (n=205). Intervention: ChatGPT (GPT-4.1) vs. Oncologist and SEER-based survival calculator. Primary outcome: Per-patient binary accuracy (0-4 correct timepoints) win-loss comparison (p=0.299).

What does this research mean for the field?

A GPT-based large language model provides comparable or superior prognostic survival estimates compared to oncologists using a single unstructured clinical note, particularly for long-term outcomes in advanced cancer. Novelty: ClaimNovelty.NOVEL_FINDING. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This study aims to compare the prognostic accuracy of ChatGPT, oncologists, and SEER-based survival calculators for cancer patients.

May 29, 2026

Comparing oncologists, a GPT-based model, and a SEER-based survival calculator for cancer prognostication.

Key Result

ChatGPT (GPT-4.1) was numerically but not statistically superior to oncologists in per-patient binary accuracy for survival prediction (25.4% vs 20.0% win rate; p=0.299).

Key Points

This study aims to compare the prognostic accuracy of ChatGPT, oncologists, and SEER-based survival calculators for cancer patients.
Conducted a retrospective comparative study with 205 adult cancer patients at a safety net clinic.
Utilized ChatGPT and an oncologist to generate survival predictions based on deidentified clinical notes.
Performed statistical analysis on binary accuracy, Brier scores, and calibration metrics.
ChatGPT achieved higher binary accuracy (52 vs. 41, p=0.299) compared to oncologists.
At 1 year, ChatGPT showed significantly lower Brier scores (p < 0.001) and superior overall accuracy at 2 years (p = 0.030).
In Stage IV disease, ChatGPT significantly outperformed oncologists in 2- and 5-year prognoses (both p < 0.001).

Study Design

Type

Observational (n=205)

Multicenter

Structured PICO

Does ChatGPT improve prognostic accuracy compared to oncologists and a SEER-based calculator in adult cancer patients?

Population

205 adult cancer patients treated in a safety net cancer clinic (25 stage I, 53 stage II, 53 stage III, 74 stage IV; 53% male).

Intervention

ChatGPT (GPT-4.1) generating binary and probabilistic predictions of survival at 6 months, 1, 2, and 5 years based on one deidentified clinical note from diagnosis.

Comparator

Oncologist unfamiliar with the patient and a publicly available SEER-based cancer-specific survival calculator (CancerSurvivalRates.com).

Outcome

Per-patient binary accuracy (0–4 correct timepoints) comparison between ChatGPT and the oncologist.

ChatGPT demonstrated comparable or superior prognostic estimates for clinically relevant time points compared to oncologists, particularly in long-term outcomes for advanced disease.

Main Result

Absolute Event Rate: 25.4% vs 20%

p-value: p=0.299

Abstract

1621 Background: Prognostic estimates guide cancer treatment planning and goals-of-care discussions. Clinicians often rely on population-based survival statistics (e.g., SEER), which may not reflect individualized risk. Large language models (LLMs) such as ChatGPT may offer more personalized estimates, but their performance relative to oncologists and population-based tools remains unclear. Methods: We conducted a retrospective comparative study of 205 adult cancer patients treated in a safety net cancer clinic. For each patient, one deidentified clinical note from the time of diagnosis was provided to a HIPPAA compliant instance of ChatGPT (GPT-4.1) and an oncologist unfamiliar with the patient. Both generated binary (alive/deceased) and probabilistic (0–100%) predictions of survival at 6 months, 1, 2, and 5 years. The primary endpoint was per-patient binary accuracy (0–4 correct timepoints) comparison between ChatGPT and the oncologist. Secondary outcomes (n=189) included Brier scores, and calibration metrics compared with oncologists and a publicly available SEER-based cancer-specific survival calculator (CancerSurvivalRates.com), and subgroup analyses by cancer stage. Significance testing used exact binomial methods for per-patient win–loss comparisons and paired nonparametric tests to compare probabilistic performance across methods. Results: Of the 205 patients, 25, 53, 53, and 74 were stages I, II, III, and IV, respectively. Gender was balanced (53% male). All surviving patients had at least 5 years of follow-up. Notes varied in final staging and treatment plans, as some patients were in their initial evaluation. In the primary analysis (N = 205), ChatGPT was numerically, not statistically, superior to the oncologist (52 vs. 41, p=0.299). In secondary analyses (n = 189), ChatGPT had superior overall accuracy with lower Brier scores at 1 year (p < 0.001) and 2 years (p = 0.030). Calibration analyses showed that at 5 years, ChatGPT achieved near-ideal reliability (calibration slope 1.018), whereas oncologists demonstrated overconfidence (slope 0.535). As expected, cancer specific survival by CSR was significantly higher than OS estimates from oncologists or ChatGPT. Stage-stratified analyses revealed oncologist superiority in Stage I disease (p = 0.036), while ChatGPT significantly outperformed oncologists in Stage IV disease at 2- and 5-year horizons (both p < 0.001). Conclusions: For a safety-net cancer clinic, using one unstructured note, ChatGPT demonstrated comparable or superior prognostic estimates for clinically relevant time points as compared to oncologists, particularly in long-term outcomes for advanced disease. Future studies should evaluate cancer specific survival and prognosis after or during treatment.

Bookmark

Cite This Study

Huang et al. (Wed,) conducted a observational in Cancer (n=205). ChatGPT (GPT-4.1) vs. Oncologist and SEER-based survival calculator was evaluated on Per-patient binary accuracy (0-4 correct timepoints) win-loss comparison (p=0.299). ChatGPT (GPT-4.1) was numerically but not statistically superior to oncologists in per-patient binary accuracy for survival prediction (25.4% vs 20.0% win rate; p=0.299).

synapsesocial.com/papers/6a192d2dfab5b468c4415f97 https://doi.org/https://doi.org/10.1200/jco.2026.44.16_suppl.1621

Bookmark