April 1, 2019

Towards Explaining the Effects of Data Preprocessing on Machine Learning

Key Points

Key points are not available for this paper at this time.

Abstract

Ensuring the explainability of machine learning models is an active research topic, naturally associated with notions of algorithmic transparency and fairness. While most approaches focus on the problem of making the model itself explainable, we note that many of the decisions that affect the model's predictive behaviour are made during data preprocessing, and are encoded as specific data transformation steps as part of pre-learning pipelines. Our research explores metrics to quantify the effect of some of these steps. In this initial work we define a simple metric, which we call volatility, to measure the effect of including/excluding a specific step on predictions made by the resulting model. Using training set rebalancing as a concrete example, we report on early experiments on measuring volatility in two public benchmark datasets, Students' Academic Performance and German Credit, with the ultimate goal of identifying predictors for volatility that are independent of the dataset and of the specific preprocessing step.

Bookmark

Towards Explaining the Effects of Data Preprocessing on Machine Learning

Key Points

Abstract

Cite This Study