What question did this study set out to answer?

This research aims to evaluate the external validity of a mobile support app designed for family caregivers using expert assessments.

February 28, 2026Open Access

Evaluating the external validity of an artificial intelligence-based mobile support app for caregiving relatives by an online expert survey

Key Points

This research aims to evaluate the external validity of a mobile support app designed for family caregivers using expert assessments.
Conducted an online survey involving nine external nursing experts.
Participants rated the appropriateness of personalized topic recommendations on a Likert scale.
Analyzed data descriptively to assess alignment with experts' views.
Used Spearman’s Rho for pairwise rank correlation of topic sequences.
Identified false negatives by checking suggested topics against system offerings.
Participants rated topic appropriateness with an average of 4.4 on the Likert scale.
The system demonstrated high personalization performance, with a precision of 0.965 and recall of 0.986.
Experts showed unanimous agreement with only minor inconsistencies in topic ordering.

Abstract

Mobile Care Backup is a support app for family caregivers that provides textual information on topics personalized to their specific care situation. Personalization is performed by an artificial intelligence-based expert system. Here, we present the evaluation of the expert system’s validity with project external nursing experts. Furthermore, we discuss the general limitations of an online survey as an evaluation methodology for expert systems. This study was conducted as an online survey in German and English. A total of nine experts, all of whom were female and had extensive (outpatient) care experience, were included. The participants were presented with descriptions of multiple fictitious family caregivers and the system’s personalized list of topics. They were then asked to rate the appropriateness on a five-item Likert scale and suggest additional topics. The collected data was analyzed descriptively to investigate whether MoCaB‘s topic recommendation strategy aligns with project external experts. For deviating topic sequences, the consensus of the experts was verified by pairwise rank correlation using Spearman’s Rho. Additional suggested topics were checked to see if they were part of the system but not provided (false negatives). In the 495 submitted ratings, participants rated the suggested topics‘ appropriateness relatively high, with an average rating of 4.4 and a median of 5. This indicates that participants consider most of the recommended topics important for the fictitious family caregiver. The system‘s personalization performance was high (precision of 0.965 and recall of 0.986). Overall, the experts are unanimous. There is no unique alternative sequence regarding the rare cases of disagreement with the system in the ordering of topics. The MoCaB system’s external validity is high, and isolated inconsistencies will be resolved in the project group. Using an online survey to evaluate the system’s validity with external experts is complex and time-consuming. Participants need a very high degree of competence, as they must infer from the title to the content. Nevertheless, it is an essential step in the evaluation process of expert systems and, if carried out correctly, can identify weak spots and further improve the expert system.

Bookmark

View Full Paper

Cite This Study

Wolff et al. (Fri,) studied this question.

synapsesocial.com/papers/69a288060a974eb0d3c03e83 https://doi.org/https://doi.org/10.1186/s12911-026-03407-2

Bookmark

View Full Paper