September 24, 2024

“Heart on My Sleeve”: From Memorization to Duty

Key Points

Key points are not available for this paper at this time.

Abstract

Can a machine learning model infringe on a copyright—do machine learning models store protected content? This work-in-progress law review Article focuses on empirical data developed, in part, to answer that question: yes. A set of unconditional image generators, diffusion models (n = 14), are trained on small slices of a dataset consisting of celebrities’ faces. The synthetic data output from these generators is then compared to training data using a variety of similarity metrics. As the empirical data shows, the question is not can models contain copyrighted works, but do models contain copyright works. In some cases, there is a 99% chance that a model will generate an image nearly identical to its training data; in other cases, even after 10,000 generations, a model does not produce any images that may be considered identical (though finding similarity is nonetheless possible). This Article uses the empirical data to argue for a series of duties to be placed on model owners.

Mark Helpful

Bookmark

Relay

Cite This Study

Nathan Reitinger (Tue,) studied this question.

synapsesocial.com/papers/68e578a6b6db6435875183cf https://doi.org/https://doi.org/10.31228/osf.io/q73vy

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Mark Helpful

Bookmark

Relay