What question did this study set out to answer?

The research aims to evaluate the susceptibility of large language models to generating toxic content using a novel automated framework.

February 8, 2026Open Access

How Toxic Can You Get? Search-based Toxicity Testing for Large Language Models

Key Points

The research aims to evaluate the susceptibility of large language models to generating toxic content using a novel automated framework.
Developed the EvoTox automated black-box testing framework.
Utilized evolutionary search techniques to generate realistic prompts.
Evaluated five leading large language models ranging from 7 to 671 billion parameters.
Compared EvoTox's performance to existing baseline methods.
EvoTox significantly outperformed existing methods in detecting toxicity.
Effect sizes reached up to 1.0, indicating strong detection capabilities.
The framework maintained a limited cost overhead of 22-35%.
Prompts generated were validated by domain experts as human-like.

Abstract

This contribution is an extended abstract of the paper originally published in the IEEE Transactions on Software Engineering (TSE) Co25. The paper presents EvoTox, an automated black-box testing framework that uses evolutionary search to assess Large Language Models’ susceptibility to generating toxic content through natural, realistic prompts. Empirical evaluation on five state-of-the-art LLMs (7-671B parameters) shows that EvoTox significantly outperforms existing baseline methods in detecting toxicity with effect sizes up to 1.0, while maintaining limited cost overhead (22−35%) and generating human-like prompts validated by domain experts.

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper