What question did this study set out to answer?

The research aims to develop a detection system for identifying AI-generated captions specifically on Instagram, addressing limitations of existing systems.

March 22, 2026Open Access

AuthentiScan: PSO-Optimized Hybrid Feature Fusion for AI-Generated Instagram Caption Detection

Key Points

The research aims to develop a detection system for identifying AI-generated captions specifically on Instagram, addressing limitations of existing systems.
Developed AuthentiScan, a hybrid detection system for short-form Instagram captions.
Employed a feature fusion strategy combining TF-IDF, SBERT, and handcrafted linguistic indicators.
Optimized feature fusion weights using Particle Swarm Optimization (PSO).
Evaluated the system on 1,580 real Instagram captions from four AI text generators.
Achieved 97.47% binary classification accuracy, a significant improvement over the baseline.
Secondary attribution experiment identified the AI source with 90.58% accuracy.
Identified TF-IDF features as the most influential for classification performance.

Abstract

The proliferation of AI-generated content on social media platforms presents new challenges for authenticity verification and platform integrity. Existing AI text detection systems are predominantly designed for long-form content such as academic papers and news articles, performing poorly on short-form social media text where context is limited and linguistic style is highly informal. This paper presents AuthentiScan, a hybrid detection system specifically designed for Instagram captions, a domain characterized by informal language, emoji use, hashtags, and text typically under 150 words. We propose a three-component feature fusion approach combining TF-IDF statistical features, Sentence-BERT (SBERT) semantic embeddings, and handcrafted linguistic indicators including emoji frequency, em dash usage, and AI-associated vocabulary patterns. Fusion weights are optimized via Particle Swarm Optimization (PSO), enabling the model to learn the optimal contribution of each feature type rather than relying on equal-weight concatenation. Evaluated on a purposebuilt, balanced dataset of 1,580 captions sourced from real Instagram posts and four AI systems (ChatGPT, Gemini, Microsoft Copilot, and Perplexity AI), our system achieves 97.47% binary classification accuracy, a 2.22 percentage point improvement over the unweighted baseline. A secondary AI source attribution experiment achieves 90.58% accuracy in identifying which AI system generated a given caption. An ablation study reveals that TF-IDF vocabulary features dominate classification performance, while SBERT provides complementary semantic signal and handcrafted features prove redundant given richer feature representations. Keywords: AI-generated text detection, short-form NLP, Instagram captions, Particle Swarm Optimization, SBERT, TFIDF, hybrid feature fusion, social media authenticity, LLM detection

AuthentiScan: PSO-Optimized Hybrid Feature Fusion for AI-Generated Instagram Caption Detection

Key Points

Abstract

Cite This Study