What question did this study set out to answer?

This research aims to explore whether digital exams differ systematically from paper-based assessments based on item functioning.

February 27, 2026Open Access

Mode effects in digital versus paper-based exams and their relationship with item characteristics

Key Points

This research aims to explore whether digital exams differ systematically from paper-based assessments based on item functioning.
Conducted Differential Item Functioning (DIF) analysis on 795 exam items from 31 components.
Items were coded using an item characteristics framework.
Exam items included both digital and paper formats allowing direct comparison.
About 20% of items showed DIF indicating potential issues with item functioning.
Items needing numeric answers exhibited more DIF compared to others, with a preference for performance on paper.
Text entry items also had a significant number of DIF instances being harder on paper.

Abstract

Digital exams, which are becoming increasingly common, are sometimes offered alongside traditional paper-based assessments. Exam boards may adopt several approaches to implement a digital exam, including transferring paper-based assessments to an on-screen format. Regardless of the taken approach, it is essential to maintain comparability across assessment modes for fairness. This article examines, using Differential Item Functioning (DIF) analysis, whether 795 items from 31 exam components (in a range of qualifications and subjects) show an assessment mode effect. These exams were delivered via the Cambridge University Press & Assessment Digital Mocks Service and were based on previous live exam papers (that is, digital and paper-based exams asked candidates the same questions). All items were also coded using an item characteristics framework to investigate whether items with certain characteristics (e.g., heavy reading demand) were more likely to exhibit DIF. About one in five items showed DIF. While the presence of DIF does not necessarily imply that a mode effect exists, it does indicate that the item is not functioning as expected. Furthermore, our findings suggest that DIF was unlikely to be systematically associated with particular item characteristics. There were two exceptions. Firstly, DIF was more common amongst items requiring numeric or mathematical entry as their answer than amongst items with other characteristics – with more items being harder on paper. Secondly, items requiring text entry as their answer also had disproportionate numbers of DIF items being harder on paper. The research findings should be treated as indicative rather than definitive for several reasons, including that candidates were not randomly assigned to the two modes, lack of data for certain item characteristics, and the difficulty of isolating the effect of each item characteristic on candidate performance across modes.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Lim et al. (Wed,) studied this question.

synapsesocial.com/papers/69a1353eed1d949a99abef71 https://doi.org/https://doi.org/10.17863/cam.127732

Bookmark

View Full Paper