With the rapid advancement of deepfake generation technologies, the development of robust detection mechanisms has become crucial. This study investigates the capability of state-of-the-art Large Language Models (LLMs) in detecting deepfake images through a comprehensive binary classification framework. We evaluate four prominent multimodal LLMs-GPT-4, Gemini 2.5 Pro, DeepSeek R1, and Sonar-using a balanced dataset of 100 real and 100 fake images sourced from CelebHQ-FM and FFHQ-FM Face Manipulation datasets. Our methodology employs binary classification to distinguish between authentic and manipulated facial images, with performance evaluated using confusion matrices and ROC curve analysis. Results demonstrate that GPT-4 achieves the highest overall performance with 87.5% accuracy and 91.4% precision, followed by Gemini 2.5 Pro (84.0% accuracy), while DeepSeek R1 and Sonar show comparable performance at approximately 74% accuracy. This research contributes to the growing field of AI-based forensic analysis by providing empirical evidence of LLMs' potential as deepfake detection tools and establishing baseline performance metrics for future comparative studies.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sachin Kafle
Prakash Parajuli
University of Southern California
Tribhuvan University
Building similarity graph...
Analyzing shared references across papers
Loading...
Kafle et al. (Thu,) studied this question.
synapsesocial.com/papers/689dfea6d61984b91e13c823 — DOI: https://doi.org/10.36227/techrxiv.175459536.67892800/v1