April 1, 2024

Bridging efficacy and efficiency: Innovations in Shapley value estimation for model-agnostic data valuation in machine learning

Key Points

Key points are not available for this paper at this time.

Abstract

The escalating advancement of generative AI models amplifies the imperative for adept data valuation techniques. Amidst a myriad of methodologies, various Shapley value estimation techniques, such as Data Shapley, have garnered attention for their proficient data valuation capabilities, despite computational challenges when grappling with large datasets. This paper introduces an innovative, empirically-driven batch method, aiming to expedite data valuation while preserving precision. This method strategically optimizes training batch sizes and testing subsets, effectively striking a balance between computational efficiency and valuation accuracy, a critical step forward given the substantial volume of data processed in contemporary machine learning tasks. A thorough evaluation of different Shapley value estimation techniques is conducted, underscoring TMC-Shapley for its notable efficacy. Furthermore, the exploration delves into the modelagnostic nature of Shapley value estimations, utilizing diverse machine learning models across distinct training phases. This practice not only demonstrates the versatility of Shapley value methods but also highlights their adaptability and generalizability across varied model architectures, reaffirming the significance of this approach in the broader context of machine learning research. The holistic approach and findings presented herein serve as a robust foundation for future explorations and optimizations in the realm of data valuation, paving the way for more nuanced and efficient methodologies

Bookmark

Cite This Study

Yilu Yang (Mon,) studied this question.

synapsesocial.com/papers/68e7103bb6db643587689e76 https://doi.org/https://doi.org/10.1117/12.3027116

Bookmark