What question did this study set out to answer?

This study aims to compare the effectiveness of generic and extended rating scales in assessing basic abdominal ultrasound skills.

March 16, 2026Open Access

Abdominal ultrasound performance assessment: a comparison of generic and extended OSCE rating scales

Key Points

This study aims to compare the effectiveness of generic and extended rating scales in assessing basic abdominal ultrasound skills.
Conducted a single-centre cohort study with 80 medical students
Utilized both extended and generic rating scales in parallel during OSCE
Evaluated performance across five domains: image settings, transducer handling, examination technique, image explanation, overall performance
Generic rating scale showed higher internal consistency (mean Cronbach’s α 0.803 vs. 0.699)
Higher generalisability for the generic scale (Phi 0.529 vs. 0.466)
Performance ratings were significantly lower on the generic scale (75.38% vs. 77.99%)
Only two stations showed significant difficulty differences between the scales

Abstract

Objective Structured Clinical Examinations (OSCE) are a widely used tool for assessing ultrasound competence, yet the optimal format for its rating scales remains debated. Extended, task-specific rating scales may offer detailed guidance, while more generic scales offer broader applicability and ease of use. This study compares whether and how ratings differed when using an extended versus a generic rating scale for assessing performance in basic abdominal ultrasound skills. In this single-centre cohort study, 80 medical students participated in an OSCE for abdominal ultrasound rated by both an extended and generic rating scale in parallel. Five domains were evaluated: image settings, transducer handling, examination technique, image explanation and overall performance. Each OSCE station was rated by two assessors simultaneously, one using each scale. The generic rating scale demonstrated significantly higher internal consistency (mean Cronbach’s α generic 0.803 vs. extended 0.699; p = 0.011) and a higher generalisability (Phi generic 0.529 vs. extended 0.466). Absolute ratings on the generic rating scale were significantly lower (performance P generic 75.38% vs. extended 77.99%, p < 0.001), though in-between stations difficulty was only significantly different for two stations. In total, the generic rating scale showed a more stringent rating with a higher internal consistency and a higher variance due to participants’ performance. Therefore, a well-designed generic rating scale can reliably and efficiently assess basic abdominal ultrasound performance, possibly offering greater generalisability and simpler implementation than extended task specific rating scales.

Bookmark

View Full Paper

Bookmark

View Full Paper

Abdominal ultrasound performance assessment: a comparison of generic and extended OSCE rating scales

Key Points

Abstract

Cite This Study