Training autoencoders on their own outputs caused a drastic drop in performance, leading to nearly complete correlation of outputs after approximately 100,000 steps.
Training autoencoders on their own generated ECG data causes catastrophic performance collapse, demonstrating the risks of AI data self-contamination.
Classical autoencoders (AE) learn a compressed, meaningful representation of the input data and denoising autoencoders (DAE) capture the true underlying data manifold even when inputs are noisy. Data is the foundation of artificial intelligence, and thus for all autoencoder types. However, all types produce, when well trained, output data which are similar to the input data. This could lead to output data being added to the data that is to be used for further learning. We show on ECG signals that adding AE/DAE-generated reconstructions to the training set — intended to augment data — causes catastrophic performance collapse.
Thomas Schanze (Wed,) conducted a other in ECG signal processing (n=300). Denoising Autoencoder (DAE) vs. Classical Autoencoder (AE) was evaluated on Average pairwise correlation index between network model outputs. Training autoencoders on their own outputs caused a drastic drop in performance, leading to nearly complete correlation of outputs after approximately 100,000 steps.