Large language models (LLMs) increasingly serve as humanlike decision-making agents in social science and applied settings. These LLM agents are typically assigned humanlike characters and placed in real-life contexts. However, how these characters and contexts shape an LLM’s behavior remains underexplored. In this study the author proposes and tests methods for probing, quantifying, and modifying an LLM’s internal representations in a dictator game, a classic behavioral experiment on fairness and prosocial behavior. The author extracts “vectors of variable variations” (e.g., “male” to “female”) from the LLM’s internal state. Manipulating these vectors during the model’s inference can substantially alter how those variables relate to the model’s decision making. This approach offers a principled way to study and regulate how social concepts can be encoded and engineered within transformer-based models, with implications for alignment, debiasing, and designing artificial intelligence agents for social simulations in both academic and commercial applications, strengthening sociological theory and measurement.
Building similarity graph...
Analyzing shared references across papers
Loading...
Ji Ma
Sociological Methodology
The University of Texas at Austin
New College
Building similarity graph...
Analyzing shared references across papers
Loading...
Ji Ma (Tue,) studied this question.
www.synapsesocial.com/papers/698d6df45be6419ac0d53473 — DOI: https://doi.org/10.1177/00811750261421220