With the rapid advancement of Large Language Model (LLM) platforms, these systems have become increasingly capable of producing literature and poetry. However, the artificial imprint remains evident in synthetic poetry, as it must adhere to specific structural rules. To comply with these rules, AI platforms rely on reusing lexical and sub-word expressions, producing verse of an acceptable but statistically distinct character. This study addresses the specific case of Arabic poetry, where these platforms have not yet achieved the capability to produce poetry entirely free of prosodic errors. This research leverages both of these tendencies—elevated reuse rates and prosodic inconsistencies—to detect Arabic poems generated by various AI platforms. Detection is performed by extracting an 11-dimensional feature vector that captures the degree of character-level n-gram reuse against AI and human reference corpora, alongside features derived from the prosodic transformation of the poem. The study demonstrated that feeding this feature vector into different machine learning classifiers can yield an accuracy of up to 99.5% when detecting poems from ChatGPT-4o, the platform used for training. The accuracy reached 92.9% when detecting poems from five unseen platforms—ChatGPT-V5, Microsoft Copilot, Google Gemini, DeepSeek, and xAI Grok—whose generated content was not used in training. This cross-platform generalization indicates a behavioral similarity in content generation methodologies across different AI systems.
Ghunaim et al. (Fri,) studied this question.