April 30, 2024Open Access

Beyond Binary: A Multi-Class Approach to Large Language Model-Generated Text Percentage Estimation

Key Points

Key points are not available for this paper at this time.

Abstract

Abstract The usage of AI in the workplace has increased to 22% of those polled, underscoring the increasing demand for the detection of AI-generated material. With an AUROC of 0.95, DetectGPT beat previous models, demonstrating its efficacy in identifying text written by LLM. Still, there are difficulties in recognizing subtleties that are typical of text generated by language models. This study presents perc-DETECTLLM, a unique method to precisely estimate the percentage of LLM- generated text by combining a zero-shot model with a Bayesian model. perc-DETECTLLM as compared to other models, shows promising results with an accuracy of 72.1% and a recall rate of 81.0%, solving the difficulties in machine-generated text detection. More than 500K samples from different domains makeup constitute our dataset, which makes thorough model training and assessment possible. Perc-DETECTLLM achieves promising text classification results by estimating the overall percentage of LLM-generated text using Bayesian inference. Ensemble techniques perform better and are used as a standard for LLM detection models. Compared to rule-based approaches, perc- DETECTLLM's data-driven methodology improves accuracy and dependability. Overall, all the experiment results are promising and could be extended to other domains, there are still difficulties with dynamic thresholding processes and perturbation function optimization, which points to areas for further study. With perc-DETECTLLM, textual content authentication in the digital age is feasible and is a substantial improvement in LLM-generated text detection.

Read Full Paperexternally

Mark Helpful

Bookmark

Relay

View Full Paper