March 13, 2024Open Access

Evaluation metrics and statistical tests for machine learning

Puntos clave

Evaluation metrics assist in assessing machine learning model performance across various tasks.
Commonly used metrics include those for binary classification, multi-class classification, and regression tasks.
Statistical tests help researchers compare model performance and accurately interpret results through suitable methods in ML contexts, including convolutional networks applications in healthcare imagery. Enhancing understanding of these tools may improve the overall effectiveness of machine learning applications.

Resumen

Abstract Research on different machine learning (ML) has become incredibly popular during the past few decades. However, for some researchers not familiar with statistics, it might be difficult to understand how to evaluate the performance of ML models and compare them with each other. Here, we introduce the most common evaluation metrics used for the typical supervised ML tasks including binary, multi-class, and multi-label classification, regression, image segmentation, object detection, and information retrieval. We explain how to choose a suitable statistical test for comparing models, how to obtain enough values of the metric for testing, and how to perform the test and interpret its results. We also present a few practical examples about comparing convolutional neural networks used to classify X-rays with different lung infections and detect cancer tumors in positron emission tomography images.

Leer artículo completoexternamente

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo