What question did this study set out to answer?

This investigation aims to explore how developers test machine learning systems and identify barriers to adopting advanced testing techniques.

May 15, 2026Open Access

Understanding Machine Learning testing in practice

Puntos clave

This investigation aims to explore how developers test machine learning systems and identify barriers to adopting advanced testing techniques.
Conducted a mining study of 398 open-source repositories to analyze testing strategies and tool usage.
Surveyed 100 practitioners to understand their perceptions, motivations, and challenges in ML testing.
Developers primarily use foundational strategies like Smoke Testing and Rule-Based Checking, relying on custom solutions.
Identified a significant gap in the adoption of specialized tools and advanced techniques like Metamorphic Testing due to integration challenges.

Resumen

Machine Learning is increasingly embedded in critical software systems, making their quality assurance a matter of growing concern. While the research community has proposed several techniques for testing ML-enabled systems, there is limited empirical evidence on whether these techniques are adopted in practice or align with developers’ testing workflows. This paper presents a two-step empirical investigation aimed at characterizing the current landscape of ML testing in real-world development. Our goal is to understand how developers approach testing, whether proposed techniques are adopted, and what barriers hinder their implementation. We designed a mixed-method study that triangulates insights from two complementary sources: (1) a mining study of 398 open-source repositories to analyze implemented testing strategies and tool usage; and (2) a survey of 100 practitioners to capture perceptions, motivations, and practical challenges. Our findings reveal that developers rely heavily on foundational strategies like Smoke Testing and Rule-Based Checking , implemented through custom testing logic built on general-purpose libraries (e.g., PyTest , NumPy ). Conversely, we identified a critical adoption gap in specialized tools and advanced techniques such as Metamorphic Testing , which are rarely implemented despite their academic prominence. Our survey indicates that this gap is driven by practical barriers, including high integration costs and a poor fit with existing developer workflows. These findings suggest that future research and tooling must prioritize usability, integration, and a clearer alignment with the pragmatic needs of developers. • Large-scale mixed-method investigation of ML testing practices in real-world development. • Triangulated insights from 398 open-source repositories (2, 018 test files) and 100 practitioners. • Practitioners rely on foundational strategies like Smoke Testing, implemented via custom solutions. • Critical adoption gap for specialized tools and advanced techniques due to workflow integration barriers. • Released datasets, analysis scripts, and a technical report to enable replication.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo