March 1, 1993

On the Stability of Performance Assessments

Key Points

Key points are not available for this paper at this time.

Abstract

This study examined the stability of scores on two types of performance assessments, an observed hands‐on investigation and a notebook surrogate. Twenty‐nine sixth‐grade students in a hands‐on inquiry‐based science curriculum completed three investigations on two occasions separated by 5 months. Results indicated that: (a) the generalizability across occasions for relative decisions was, on average, moderate for the observed investigations (.52) and the notebooks (.50); (b) the generalizability for absolute decisions was only slightly lower; (c) the major source of measurement error was the person by occasion (residual) interaction; and (d) the procedures students used to carry out the investigations tended to change from one occasion to the other.

On the Stability of Performance Assessments

Key Points

Abstract

Cite This Study