Key points are not available for this paper at this time.
This paper presents a method for dataset manipulation based on Mixed Integer Linear Programming (MILP). The proposed optimization can narrow down a dataset to a particular size, while enforcing specific distributions across different dimensions. It essentially leverages the redundancies of an initial dataset in order to generate more compact versions of it, with a specific target distribution across each dimension. If the desired target distribution is uniform, then the effect is balancing: all values across all different dimensions are equally represented. Other types of target distributions can also be specified, depending on the nature of the problem. The proposed approach may be used in machine learning, for shaping training and testing datasets, or in crowdsourcing, for preparing datasets of a manageable size.
Vonikakis et al. (Wed,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: