Accurate identification of AI-generated content is critical for preserving scientific credibility. This exploratory study was performed to assess the effectiveness of eight AI detection tools (free versions) in differentiating human-written from AI-generated articles within the oral and maxillofacial surgery field. The analysis included 24 human-written articles and 12 AI-generated articles produced using ChatGPT, DeepSeek, Gemini, and Copilot. The primary outcome was the detection effectiveness of each tool, expressed as a mean percentage score, for human-written and AI-generated text. Secondary outcomes were usability and processing limitations. The statistical analysis was performed in RStudio (P < 0.05). For published human-written text, QuillBot showed perfect detection (none detected as AI-written), and was fast and easy to use. For the AI texts, Copyleaks performed best (mean score 99.6/100), followed by Sapling (mean score 95.6/100). A weak, non-significant correlation was found between manuscript length and detection effectiveness for published human-written (ρ = -0.15, P = 0.44) and AI-generated texts (ρ = -0.08, P = 0.70). QuillBot appears to be an accessible and effective tool for distinguishing human- from AI-generated text. Its effectiveness could be enhanced when used alongside other detection tools like Sapling or Copyleaks, allowing articles produced with excessive reliance on AI to be detected.
Grillo et al. (Thu,) studied this question.