What is the clinical evidence from this study?

Study design: Other. Population: Type 2 Diabetes. Intervention: Generative AI platforms vs. Hong Kong Health Bureau and ADA guidelines. Primary outcome: Semantic similarity (SS) score between AI-generated responses and guideline-based reference answers (SS average scores ranging from 0.747 to 0.806).

What does this research mean for the field?

Generative AI platforms generate dietary advice for older, low-income Chinese patients with Type 2 diabetes that substantially overlaps with established clinical guidelines, though platform variability precludes their use as a sole information source. Novelty: ClaimNovelty.INCREMENTAL. Consensus alignment: ConsensusAlignment.NEUTRAL.

What question did this study set out to answer?

This study aims to assess how well generative AI dietary advice aligns with specific diabetes dietary guidelines for older, low-income Chinese individuals.

June 7, 2026

2767-LB: Evaluation of Generative AI Dietary Advice for Older, Low-Income Chinese Population with Type 2 Diabetes in the U.S.

Key Result

Generative AI platforms provided dietary advice for older Chinese patients with T2DM that substantially overlapped with established guidelines (semantic similarity scores ranging from 0.747 to 0.806).

Key Points

This study aims to assess how well generative AI dietary advice aligns with specific diabetes dietary guidelines for older, low-income Chinese individuals.
Six generative AI platforms were tested with ten diabetes diet-related questions pertinent to older, low-income Chinese individuals.
Reference responses were derived from Hong Kong and ADA guidelines using NotebookLM for structured answers, serving as a gold standard.
AI-generated responses were compared to these reference answers using semantic similarity analysis, averaging scores across three trials per question.
AI-generated responses had semantic similarity average scores ranging from 0.747 to 0.806 across the different platforms.
These scores indicate substantial overlap with established diabetes dietary guidelines.
Variability among platforms suggests a need for structured evaluation frameworks to assess the alignment and reliability of Gen AI health information.

Structured PICO

Do Generative AI platforms provide dietary advice that aligns with established guidelines for older, low-income Chinese populations with T2DM?

Population

Evaluation of 6 Generative AI platforms answering 10 diabetes diet-related questions tailored to older, low-income Chinese populations with T2DM in the U.S.

Exposure

Six Generative AI platforms

Comparator

Reference responses generated from Hong Kong Health Bureau (HK) and ADA guidelines using NotebookLM

Outcome

Semantic similarity (SS) analysis scores between AI-generated responses and guideline-based reference answers

Generative AI platforms show substantial overlap with established diabetes dietary guidelines for older, low-income Chinese populations, but variability exists, indicating they should not be relied upon as the sole source of information.

Main Result

Effect estimate: SS average scores ranging from 0.747 to 0.806

Limitations

Variability across platforms underscores the need for structured evaluation frameworks to systemically assess the alignment, reliability, and cultural relevance of Gen AI health information.

Abstract

Introduction and Objective: Generative AI (Gen AI) platforms are increasingly used for health information, yet the alignment of their outputs with culturally specific diabetes dietary guidelines remains unclear. This study evaluates the extent to which AI-generated dietary guidelines align with established guidelines from the Hong Kong Health Bureau (HK) and the ADA for the older, low-income Chinese population with T2DM. Methods: Six Gen AI platforms were prompted with 10 common diabetes diet-related questions tailored to this demographic group. Reference responses were generated from HK and ADA guidelines using NotebookLM to convert guideline content into structured answers, serving as the gold standard. AI-generated responses were then compared to the guideline-based reference answers using semantic similarity (SS) analysis. Scores (between 0 and 1) were averaged across three trials per question. Results: AI-generated responses demonstrated SS average scores ranging from 0.747 to 0.806 across platforms. Conclusion: Results suggest substantial overlap with established diabetes dietary guidelines. Variability across platforms underscores the need for structured evaluation frameworks to systemically assess the alignment, reliability, and cultural relevance of Gen AI health information. Until such validated frameworks are established, Gen AI tools should not be relied upon as the sole source of dietary guidelines in this population. Disclosure E. Chiu: None. C. Young: Advisory Panel; Ended; Sanofi. Funding Touro University California College of Osteopathic Medicine, IRAP-SR Award

Bookmark