Background OpenAI developed ChatGPT as an advanced artificial intelligence (AI)-driven natural language processing system. ChatGPT is capable of generating responses through statistical pattern recognition established during pretraining. Objective To ascertain whether ChatGPT could respond to patients with breast cancer in a way that was consistent with evidence-based medical practices and a breast cancer clinical guideline. This guideline was a practical pocket book based on the latest evidence and took into account the national data, and to evaluate the ability of AI to provide accurate and up-to-date information to patients, potentially serving as a supplementary resource for medical professionals. Methods The research team designed a series of tests to assess the responses of ChatGPT to specific questions related to breast cancer diagnosis, treatment options, and post-treatment care. Thirty clinically validated breast cancer questions spanning diagnosis, prognosis, treatment, and pharmacotherapy were administered through three iterative trials to: (1) GPT-3.5/GPT-4.0 (5min interval between trials) and (2) three breast surgeons stratified by expertise (high/medium/low). Responses were scored dichotomously (1 = guideline-consistent; 0 = inconsistent) with total scores ranging 0 to 3 per question. For each consistent and inconsistent answer with the standard answer, 1 and 0 points were given, respectively. The sum of the answers obtained from the three experts resulted in a score of 0 to 3. Data analysis included mean score comparisons (analysis of variance with post hoc Tukey tests), subgroup analyses by question category, and inter-rater reliability assessment. Results Performance comparison between GPT-3.5 and GPT-4.0 across breast surgery subspecialties and question types revealed that GPT-4.0 generally outperformed GPT-3.5, despite the absence of significant difference in the mean scores for most items. We found that GPT-3.5 and have the same medical response ability as lower qualified breast surgeons, while GPT-4.0 have the same ability as higher qualified breast surgeons.
Shi et al. (Sun,) studied this question.