Mobile Care Backup is a support app for family caregivers that provides textual information on topics personalized to their specific care situation. Personalization is performed by an artificial intelligence-based expert system. Here, we present the evaluation of the expert system’s validity with project external nursing experts. Furthermore, we discuss the general limitations of an online survey as an evaluation methodology for expert systems. This study was conducted as an online survey in German and English. A total of nine experts, all of whom were female and had extensive (outpatient) care experience, were included. The participants were presented with descriptions of multiple fictitious family caregivers and the system’s personalized list of topics. They were then asked to rate the appropriateness on a five-item Likert scale and suggest additional topics. The collected data was analyzed descriptively to investigate whether MoCaB‘s topic recommendation strategy aligns with project external experts. For deviating topic sequences, the consensus of the experts was verified by pairwise rank correlation using Spearman’s Rho. Additional suggested topics were checked to see if they were part of the system but not provided (false negatives). In the 495 submitted ratings, participants rated the suggested topics‘ appropriateness relatively high, with an average rating of 4.4 and a median of 5. This indicates that participants consider most of the recommended topics important for the fictitious family caregiver. The system‘s personalization performance was high (precision of 0.965 and recall of 0.986). Overall, the experts are unanimous. There is no unique alternative sequence regarding the rare cases of disagreement with the system in the ordering of topics. The MoCaB system’s external validity is high, and isolated inconsistencies will be resolved in the project group. Using an online survey to evaluate the system’s validity with external experts is complex and time-consuming. Participants need a very high degree of competence, as they must infer from the title to the content. Nevertheless, it is an essential step in the evaluation process of expert systems and, if carried out correctly, can identify weak spots and further improve the expert system.
Wolff et al. (Fri,) studied this question.