Are state-of-the-art large language models conscious, or capable of anything like consciousness? We introduce ConsciousnessBench: the first systematic benchmark designed to empirically evaluate consciousness-relevant traits in frontier language models, grounded in 5 leading scientific theories. We assess 8 advanced models via 840 self-report responses, finding not only statistically robust performance differences, but—more importantly—evidence of distinct model cognitive profiles and engagement strategies with consciousness-related constructs. Our results reveal that some models demonstrate theoretical fluency, specialization in certain cognitive tasks, or even phenomenological exploration, while others default to deflection. While we cannot deliver a definitive verdict on AI consciousness, our findings show that consciousness-related capacities—and their computational diversity—are now empirically tractable, even if not yet empirically decidable.
Haoran Zheng (Tue,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: