On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Perception of toxicity evolves over time and often differs between geographies and cultural backgrounds. Similarly, black-box commercially available APIs for detecting toxicity, such as the Perspective API, are not static, but frequently retrained to address any unattended weaknesses and biases. We evaluate the implications of these changes on the reproducibility of findings that compare the relative merits of models and methods that aim to curb toxicity. Our findings suggest that research that relied on inherited automatic toxicity scores to compare models and techniques may have resulted in inaccurate findings. Rescoring all models from HELM, a widely respected living benchmark, for toxicity with the recent version of the API led to a different ranking of widely used foundation models. We suggest caution in applying apples-to-apples comparisons between studies and call for a more structured approach to evaluating toxicity over time.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Luiza Pozzobon

Beyza Ermiş

Bahçeşehir University

Patrick A. Lewis

Royal Veterinary College

Actions

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Pozzobon et al. (Sun,) studied this question.

synapsesocial.com/papers/69ffb0e810d6befb257751b6 — DOI: https://doi.org/10.18653/v1/2023.emnlp-main.472

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Plug and Play Language Models: A Simple Approach to Controlled Text\n Generation· 2019 · 407 citations
But Who Protects the Moderators? The Case of Crowdsourced Image Moderation· 2018 · 26 citations
Reproducible Research in Computational Science· 2011 · 1,434 citations
PaLM: Scaling Language Modeling with Pathways· 2022 · 2,130 citations
Like trainer, like bot? Inheritance of bias in algorithmic content moderation· 2017 · 31 citations

Also consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Plug and Play Language Models: A Simple Approach to Controlled Text\n Generation· 2019 · 407 citations
But Who Protects the Moderators? The Case of Crowdsourced Image Moderation· 2018 · 26 citations
Reproducible Research in Computational Science· 2011 · 1,434 citations
PaLM: Scaling Language Modeling with Pathways· 2022 · 2,130 citations
Like trainer, like bot? Inheritance of bias in algorithmic content moderation· 2017 · 31 citations

On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research

Puntos clave

Resumen

Citation Network

Connected Papers

Discussion

Authors

Actions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider