What question did this study set out to answer?

This study aims to investigate how negotiations over metrics and thresholds influence the application of machine learning systems.

April 17, 2026Open Access

Making machine learning good enough – studying the political endeavour of finding ‘right’ metrics and thresholds

Key Points

This study aims to investigate how negotiations over metrics and thresholds influence the application of machine learning systems.
Conducted ethnographic fieldwork within the British Broadcasting Corporation (BBC)
Analyzed the Recommendations Team's process of defining 'good enough' for recommender systems
Examined the entanglement of organisational objectives with data and technical infrastructures
Followed practices of A/B testing to evaluate system performance thresholds
Identified that the concept of 'good enough' is politically negotiated within the BBC
Showed that adjustments to performance thresholds reflect competing values and constraints
Demonstrated the reliance on both quantitative metrics and qualitative evaluations in establishing thresholds

Abstract

As machine learning (ML) technologies move from their discrete existence in research to being highly applied technologies across society, critical scholars have begun to address the epistemological conditions that shape the emergence of such systems and their societal implications. In this paper, we investigate a specific epistemological condition of ML, namely, how ML systems rely on ongoing negotiations and agreements of ‘good enough’ to be deployed. We do so by drawing on ethnographic fieldwork with the British Broadcasting Corporation (BBC) – a large data- and value-driven organisation. In studying the epistemological function and politics of ‘good enough’, we take an AI lab studies approach, following the Recommendations Team's efforts to materialise ‘good enoughness’ and make it negotiable as they develop and modify recommender systems that aim to better serve the BBC's audiences. Through our ethnographic account, we demonstrate how the team relies on various metrics and qualitative evaluations to inform provisional performance thresholds before submitting the ML systems to A/B testing to establish whether one of them is ‘good enough’ to deploy. By following these processes of establishing ‘good enough’, we see how these negotiations are entangled in various, sometimes competing organisational objectives, as well as particular data and technical infrastructures. By extension, we show how the metricised performance scores of AB testing are negotiated in practice by readjusting performance thresholds to manoeuvre different values and constraints. Ultimately, our paper shows that establishing ‘good enough’ is a political endeavour of adjusting seemingly objective evaluation criteria to find the best-fitting metrics and ‘right’ thresholds.

Making machine learning good enough – studying the political endeavour of finding ‘right’ metrics and thresholds

Key Points

Abstract

Cite This Study