August 1, 2024Open Access

Measuring complex psychological and sociological constructs in large-scale text

Key Points

Key points are not available for this paper at this time.

Abstract

In recent years, there has been an increasing exchange between social science and machine learning. In principle, natural language processing enables social scientists to systematically process large amounts of text, while rich domain knowledge helps machine learning scholars to build valid models of social phenomena. However, there is a lack of clear guidelines for constructing valid and reliable mixed methods approaches, which can increase the rigor and comparability of computational social science research. We provide a set of guidelines for leveraging human data annotation and automatic text classification at scale in five stages: (1) classification scheme development, (2) data labeling, (3) model selection, (4) model training and performance improvement, and (5) statistical analysis. Using examples from our own research on countering online hate, we outline potential problems and respective solutions. We demonstrate how consequently integrating expertise from social science and machine learning can enhance the study of diverse social phenomena.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Alina Herderich

Jana Lasser

Mirta Galešić

Actions

Institutions

Harvard University

Arizona State University

University of Vermont

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Measuring complex psychological and sociological constructs in large-scale text

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider