Background The documentation process in psychiatric interviews is laborious and often compromises the quality of patient care. Addressing this challenge, we explored the potential of artificial intelligence (AI) to automate documentation tasks and improve efficiency in psychiatric practice. Methods Six simulated psychiatric interviews were transcribed and summarized using an AI model and compared to a gold standard, together with reports written by humans. Reports were decomposed into binary items using a predefined codebook covering patient information, current complaints, psychiatric history, medical history, medication, substance use, social history, family history, vegetative symptoms, psychopathology, and preliminary diagnoses. Transcription accuracy, performance, and inter-rater reliability were evaluated. Results The AI achieved a high transcription accuracy with a mean word error rate of 9.44% and a Levenshtein score of 0.996, aligning with current voice-to-text transcription standards. Inter-rater reliability was high overall. The mean Cohen’s κ was 0.80 (SD = 0.33), the mean percent agreement was 0.96 (SD = 0.07), and the mean Gwet’s AC1 was 0.93 (SD = 0.12). Across all categories, human reports showed substantially higher agreement with the gold standard than AI reports. The mean accuracy was 0.94 (SD = 0.01) for human reports and 0.78 (SD = 0.08) for AI reports, t(5) = 6.33, p = .003. The mean F1 scores were also higher for human reports (M = 0.89, SD = 0.02) than for AI reports (M = 0.55, SD = 0.13), t(5) = 7.38, p = .001. Occasionally, AI reports provided more detailed contextual information than human reports. However, AI reports also introduced clinically relevant inaccuracies and struggled in complex domains such as psychopathology. Conclusions While our findings suggest promising prospects for AI-driven documentation in psychiatry, further development is essential to enhance the model’s ability to comprehensively assess and document psychopathological features. Importantly, some AI-generated inaccuracies were clinically significant, underscoring the necessity of a final clinical review by a qualified professional. These findings are limited by the very small number of highly controlled simulated interviews. Larger studies with real patients, diverse clinicians, and routine clinical workflows will be required. Nonetheless, AI-supported documentation has the potential to considerably reduce time demands and alleviate the documentation burden in psychiatric care.
Building similarity graph...
Analyzing shared references across papers
Loading...
Bengican Gülegen
Raoul Haaf
Emanuel Schlüßler
SHILAP Revista de lepidopterología
Frontiers in Psychiatry
Charité - Universitätsmedizin Berlin
Digital Equipment (Germany)
St. Joseph-Krankenhaus
Building similarity graph...
Analyzing shared references across papers
Loading...
Gülegen et al. (Wed,) studied this question.
www.synapsesocial.com/papers/699010382ccff479cfe56c3e — DOI: https://doi.org/10.3389/fpsyt.2026.1621532