Digital exams, which are becoming increasingly common, are sometimes offered alongside traditional paper-based assessments. Exam boards may adopt several approaches to implement a digital exam, including transferring paper-based assessments to an on-screen format. Regardless of the taken approach, it is essential to maintain comparability across assessment modes for fairness. This article examines, using Differential Item Functioning (DIF) analysis, whether 795 items from 31 exam components (in a range of qualifications and subjects) show an assessment mode effect. These exams were delivered via the Cambridge University Press & Assessment Digital Mocks Service and were based on previous live exam papers (that is, digital and paper-based exams asked candidates the same questions). All items were also coded using an item characteristics framework to investigate whether items with certain characteristics (e.g., heavy reading demand) were more likely to exhibit DIF. About one in five items showed DIF. While the presence of DIF does not necessarily imply that a mode effect exists, it does indicate that the item is not functioning as expected. Furthermore, our findings suggest that DIF was unlikely to be systematically associated with particular item characteristics. There were two exceptions. Firstly, DIF was more common amongst items requiring numeric or mathematical entry as their answer than amongst items with other characteristics – with more items being harder on paper. Secondly, items requiring text entry as their answer also had disproportionate numbers of DIF items being harder on paper. The research findings should be treated as indicative rather than definitive for several reasons, including that candidates were not randomly assigned to the two modes, lack of data for certain item characteristics, and the difficulty of isolating the effect of each item characteristic on candidate performance across modes.
Lim et al. (Wed,) studied this question.