What question did this study set out to answer?

The study aims to evaluate the efficacy of AI in automating documentation tasks during psychiatric interviews.

February 14, 2026Open Access

AI-generated documentation of psychiatric interviews: a proof-of-concept study

Key Points

The study aims to evaluate the efficacy of AI in automating documentation tasks during psychiatric interviews.
Simulated psychiatric interviews were transcribed and summarized using an AI model.
AI reports were compared against human-written reports and a gold standard.
Transcription accuracy, performance, and inter-rater reliability were assessed using various metrics.
Reports were coded into binary items based on a predefined codebook.
The AI achieved a transcription accuracy with a mean word error rate of 9.44%.
Inter-rater reliability for AI reports was high, but human reports showed better agreement with the gold standard.
Mean accuracy for human reports was 0.94 compared to 0.78 for AI reports (p = .003).
Human reports had higher F1 scores (M=0.89) than AI reports (M=0.55, p = .001).
AI reports done occasionally included more detailed information but had clinically relevant inaccuracies.

Abstract

Background The documentation process in psychiatric interviews is laborious and often compromises the quality of patient care. Addressing this challenge, we explored the potential of artificial intelligence (AI) to automate documentation tasks and improve efficiency in psychiatric practice. Methods Six simulated psychiatric interviews were transcribed and summarized using an AI model and compared to a gold standard, together with reports written by humans. Reports were decomposed into binary items using a predefined codebook covering patient information, current complaints, psychiatric history, medical history, medication, substance use, social history, family history, vegetative symptoms, psychopathology, and preliminary diagnoses. Transcription accuracy, performance, and inter-rater reliability were evaluated. Results The AI achieved a high transcription accuracy with a mean word error rate of 9.44% and a Levenshtein score of 0.996, aligning with current voice-to-text transcription standards. Inter-rater reliability was high overall. The mean Cohen’s κ was 0.80 (SD = 0.33), the mean percent agreement was 0.96 (SD = 0.07), and the mean Gwet’s AC1 was 0.93 (SD = 0.12). Across all categories, human reports showed substantially higher agreement with the gold standard than AI reports. The mean accuracy was 0.94 (SD = 0.01) for human reports and 0.78 (SD = 0.08) for AI reports, t(5) = 6.33, p = .003. The mean F1 scores were also higher for human reports (M = 0.89, SD = 0.02) than for AI reports (M = 0.55, SD = 0.13), t(5) = 7.38, p = .001. Occasionally, AI reports provided more detailed contextual information than human reports. However, AI reports also introduced clinically relevant inaccuracies and struggled in complex domains such as psychopathology. Conclusions While our findings suggest promising prospects for AI-driven documentation in psychiatry, further development is essential to enhance the model’s ability to comprehensively assess and document psychopathological features. Importantly, some AI-generated inaccuracies were clinically significant, underscoring the necessity of a final clinical review by a qualified professional. These findings are limited by the very small number of highly controlled simulated interviews. Larger studies with real patients, diverse clinicians, and routine clinical workflows will be required. Nonetheless, AI-supported documentation has the potential to considerably reduce time demands and alleviate the documentation burden in psychiatric care.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Bengican Gülegen

Raoul Haaf

Emanuel Schlüßler

Journals

SHILAP Revista de lepidopterología

Frontiers in Psychiatry

Actions

Institutions

Charité - Universitätsmedizin Berlin

Digital Equipment (Germany)

St. Joseph-Krankenhaus

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

AI-generated documentation of psychiatric interviews: a proof-of-concept study

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study