The classic machine learning setting involves learning the distribution of data which is assumed to be i.i.d. In recent years ML has evolved to encompass applications that falls outside the i.i.d regime. When the data is sequential one encounters situations where the distribution is changing continually or that there is temporal correlation violating the i.i.d assumption. Examples include, incremental learning, time series and reinforcement learning. This note explores the challenges faced when dealing with changing data distribution, Dt, where the subscript t represents an ordered sequence such as time or training iterations. In the context of reinforcement learning Dt will correspond to the experiences collected by the policy after being trained for t iterations/epochs. Since the weights of the policy network are continually changed the distribution of the experiences collected by the policy will also change throughout the training. It is instructive to think of training a neural net under changing data distribution Dt as learning a new task corresponding to the information contained in Dt. The challenge in the sequential learning setting such as RL is twofold. First, learning from later tasks might interfere with previously learned tasks: the weights of the neural network associated with a given task might be altered by learning a new task resulting in a performance degradation on previously learned task known as catastrophic forgetting. While one needs to stabilise the performance on old tasks, it is imperative to continue learning from recent experiences as well. This is known as the stability-plasticity dilemma and has parallels in learning in human brain
Vishagan Sivanesan (Fri,) studied this question.