Key points are not available for this paper at this time.
We found wide variation in the interrater reliability of measures used to evaluate apps, and some measures are more robust across categories of apps than others. The measures with the highest degree of interrater reliability tended to be those that involved the least rater discretion. Clinical quality measures such as effectiveness, ease of use, and performance had relatively poor interrater reliability. Subsequent research is needed to determine consistent means for evaluating the performance of apps. Patients and clinicians should consider conducting their own assessments of apps, in conjunction with evaluating information from reviews.
Powell et al. (Wed,) studied this question.