What is the clinical evidence from this study?

Study design: Cross-Sectional. Population: Personality measurement (n=450). Intervention: Original BFI-2 rating scale vs. Alternative rating scale. Primary outcome: Extreme scores on personality scales.

June 13, 2025Open Access

Agree or agree a little? The rating scale of the BFI-2 causes extreme responses

Key Result

The original BFI-2 rating scale causes extreme scores and leads to significantly different personality trait scores compared to alternative rating scales.

Study Design

Type

Cross-Sectional (n=450)

Multicenter

Yes

Structured PICO

Population

Respondents in Study 1, 150 Norwegian students in Study 2, and 300 U.S. residents in Study 3

Intervention

Original rating scale of the BFI-2 (using 'Disagree a little' and 'Agree a little' for intermediate points)

Comparator

Alternative rating scale (using 'Disagree' and 'Agree' for intermediate points)

Outcome

Extreme scores and differentiation in the upper range of personality traitspatient reported

The wording of the intermediate scale points in the BFI-2 rating scale significantly impacts the extremity of responses and test scores, complicating cross-study comparisons.

Limitations

Researchers usually do not report the scale used, making it challenging to compare results across studies.

Abstract

The Big Five Inventory-2 (BFI-2) is one of the most frequently used personality measures in research, but several scales provide extreme scores. The present research shows that extreme scores are caused by the original rating scale, which uses the labels Disagree a little and Agree a little for the intermediate scale points 2 and 4, respectively. In contrast, other major personality inventories (and some BFI-2 versions) use the labels Disagree and Agree , respectively. In Study 1, respondents assessed the agreement expressed by the response labels in each scale. In Study 2, 150 Norwegian students completed the BFI-2 twice, applying both rating scales on the same occasion. In Study 3, 300 U.S. residents completed the English BFI-2 once, with one of the two scales. The original rating scale in the BFI-2 seems to cause extreme scores, and a graded response model reveals that it does not differentiate in the upper range of some personality traits. Moreover, the two different rating scales, which are both used with the BFI-2 (and the BFI), lead to significantly different scores on several personality scales. As researchers usually do not report the scale used, it can be challenging to compare results across studies. • The BFI-2 is used by researchers around the world to measure the Big Five of personality. • Several personality scales provide extreme scores, and do not differentiate well. • Two response scales are used with the BFI-2, and they provide different test scores. • The original rating scale causes the extreme scores, due to inappropriate wording. • It is often impossible to know which response scale has been used in a particular study.