The proliferation of AI-generated content on social media platforms presents new challenges for authenticity verification and platform integrity. Existing AI text detection systems are predominantly designed for long-form content such as academic papers and news articles, performing poorly on short-form social media text where context is limited and linguistic style is highly informal. This paper presents AuthentiScan, a hybrid detection system specifically designed for Instagram captions, a domain characterized by informal language, emoji use, hashtags, and text typically under 150 words. We propose a three-component feature fusion approach combining TF-IDF statistical features, Sentence-BERT (SBERT) semantic embeddings, and handcrafted linguistic indicators including emoji frequency, em dash usage, and AI-associated vocabulary patterns. Fusion weights are optimized via Particle Swarm Optimization (PSO), enabling the model to learn the optimal contribution of each feature type rather than relying on equal-weight concatenation. Evaluated on a purposebuilt, balanced dataset of 1,580 captions sourced from real Instagram posts and four AI systems (ChatGPT, Gemini, Microsoft Copilot, and Perplexity AI), our system achieves 97.47% binary classification accuracy, a 2.22 percentage point improvement over the unweighted baseline. A secondary AI source attribution experiment achieves 90.58% accuracy in identifying which AI system generated a given caption. An ablation study reveals that TF-IDF vocabulary features dominate classification performance, while SBERT provides complementary semantic signal and handcrafted features prove redundant given richer feature representations. Keywords: AI-generated text detection, short-form NLP, Instagram captions, Particle Swarm Optimization, SBERT, TFIDF, hybrid feature fusion, social media authenticity, LLM detection
Divisha Varma Balaraju (Fri,) studied this question.