Abstract Dictionary-based text analysis, where researchers select keywords to measure constructs such as public sentiment, anxiety, or political attitudes in large text corpora, is widely used in computational social science. However, keyword selection is rarely subjected to the same psychometric scrutiny applied to survey instruments: studies seldom report reliability, evaluate internal structure, or test whether the measurement holds across subpopulations or time points. Moreover, few existing methods enable the construction of measures that reflect theoretical or expected relationships among keywords. This paper proposes a method that brings these capabilities to text analysis by applying Confirmatory Factor Analysis (CFA) to word embeddings. Keywords are treated as observed indicators of a latent construct, and their semantic relationships, operationalized as centered cosine similarities between embedding vectors, serve as the input correlation matrix for CFA estimation. The framework enables researchers to estimate factor loadings and model fit indices (CFI, TLI, RMSEA, SRMR), compute reliability coefficients (Cronbach’s alpha, Omega), and test measurement invariance across groups or time periods using multigroup models with structured means. Moreover, the method allows researchers to compare latent construct intensity across groups or time periods, transforming keyword-based text measures from descriptive indicators into formally comparable latent variables. The method is demonstrated through an empirical application of the discourse of war anxiety during Russia’s 2022 invasion of Ukraine. A Monte Carlo simulation further examines the behavior of fit indices under random keyword selection. The approach complements existing text analysis methods and can be implemented using standard software, such as the lavaan R package.
Artur Pokropek (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: