This paper critically analyzes MBTI-based personality profiling using Large Language Models (LLMs), examining both their use as tools for inferring human personality and as subjects evaluated through psychometric frameworks. We review recent work (2020–2025) spanning traditional machine learning, fine-tuned transformer models, and zero-shot prompting approaches across datasets such as Kaggle MBTI, PersonalityCafe, Pandora, and MBTIBench. While top-performing LLM-based systems report 75%–85% accuracy at the dichotomy level, improvements over baselines are often modest, domain-dependent, and sensitive to dataset biases. Recent benchmarks employing soft labels reveal systematic issues, including polarized predictions, overconfidence, and limited calibration relative to population trait distributions. Beyond predictive performance, we examine emerging research that applies MBTI instruments directly to LLMs, showing that models exhibit reproducible yet context-dependent “personality-like” profiles, often skewed toward socially desirable traits due to alignment training. These findings raise conceptual questions about whether stable internal dispositions can meaningfully be attributed to generative systems whose outputs vary across prompts and versions. We argue that MBTI-based modeling with LLMs faces three core challenges: psychometric limitations of the MBTI construct itself, methodological weaknesses in self-reported training data, and philosophical ambiguity regarding the notion of AI personality. The paper concludes by outlining ethical risks, evaluation gaps, and research directions for more rigorous, calibrated, and theoretically grounded personality modeling in artificial intelligence systems.
Tshimula et al. (Wed,) studied this question.