April 14, 2024Open Access

Pseudo Factor Analysis of Language Embedding Similarity Matrices: New Ways to Model Latent Constructs

Key Points

Key points are not available for this paper at this time.

Abstract

This article builds on recent work using Large Language Models (LLMs) in psychometrics and, in particular, the use of sentence transformer models to generate pseudo-discrimination parameters. Pseudo-discrimination parameters are discrimination estimates that correlate with empirical discrimination parameters without needing empirical data collection. While earlier work looked at pseudo-discrimination on an item-by-construct basis, we introduce and evaluate the use of pseudo-factor analysis. Pseudo-factor analysis is a model-based approach to generating latent construct measurement model parameters, such as the number of construct dimensions and the relations between factors and their indicators. Like pseudo-discrimination, pseudo-factor analysis does not require response data. The approach involves factor analyzing the matrix of cosine similarities amongst scale (or item) language embeddings. Across two studies that used a variety of transformer models and three encoding approaches (atomic, atomic reversed, and one-pop), pseudo-factor analyses for the NEO and HEXACO personality inventories showed theoretically expected structures and these pseudo factor structures were strongly related to their established empirical factor structures. We provide a Python Shiny application for calculating pseudo-factor analysis discrimination parameters and related psychometric estimates.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper