Continual learning (CL) aims to empower machine learning (ML) models to learn continually from new data, while building upon previously acquired knowledge without forgetting. As models have evolved from small to large pretrained architectures, and from supporting unimodal to multimodal (MM) data, multimodal CL (MMCL) methods have recently emerged. The primary complexity of MMCL is that it extends beyond a simple stacking of unimodal CL methods. Such straightforward approaches often suffer from MM catastrophic forgetting, yielding unsatisfactory performance. In addition, MMCL introduces new challenges that unimodal CL methods fail to adequately address, including modality imbalance, complex modality interaction, high computational costs, and degradation of the pretrained zero-shot capability of MM backbones. In this work, we present the first comprehensive survey on MMCL. We provide essential background knowledge and MMCL settings, as well as a structured taxonomy of MMCL methods. We categorize MMCL methods into four categories, i.e., regularization-based, architecture-based, replay-based, and prompt-based methods, explaining their methodologies and highlighting their key innovations. Additionally, to prompt further research in this field, we summarize open MMCL datasets and benchmarks, provide an in-depth discussion, and discuss several promising future directions. We have also created a GitHub repository for indexing relevant MMCL articles and open resources available at https://github.com/LucyDYu/Awesome-Multimodal-Continual-Learning.
Yu et al. (Thu,) studied this question.