Does Continual Learning = Catastrophic Forgetting?

Anh Thai, Stefan Stojanov, Isaac Rehg, James M. Rehg

Georgia Institute of Technology

Continual learning is known for suffering from catastrophic forgetting. This phenomenon happens when the earlier learned concepts are forgotten due to the arrival of more recent learning samples. In this work, we present our novel main finding that reconstruction tasks (3D shape reconstruction, 2.5D sketch estimation and image autoencoding) do not suffer from catastrophic forgetting ([Sec. 1]). We attempt to explain this behavior in [Sec. 2]. We further provide a potential of having a proxy task to improve the performance of continual learning of classification in [Sec. 3]. In [Sec. 4] we introduce YASS—a novel yet simple baseline for classification. Finally, we present DyRT, a novel tool for tracking the dynamic of representation learning in continual learning models in [Sec. 5].

1. Reconstruction Tasks Do Not Suffer from Catastrophic Forgetting

We demonstrate that, unlike classification task, reconstruction tasks do not suffer from catastrophic forgetting when learning continually. The reconstruction algorithm used is standard SGD run on each learning exposure. We do not need to utilize additional losses, external memory, or other methods to achieve good continual learning performance. Average accuracy on seen classes is shown at each learning exposure in the following plots.

Reconstruction tasks do not suffer from catastrophic forgetting in single exposure case.

Repeated exposures lead asymptotically to batch performance.

2. Representation Positive Transfer in Reconstruction Tasks

We demonstrate that, unlike classification task, 3D shape reconstruction task is able to propagate the learned feature representation forward between learning exposures (referred to as positive forward transfer), which is presumably one of the keys to the success of reconstruction tasks. In the following experiment, we utilize GDumb [1] and a variant of GDumb, GDumb++, which differs from GDumb in that the model trained at each exposure is initialized with the weights learned in the previous exposure (ie. not starting from scratch). Since both GDumb and GDumb++ are trained on the same amount of input data, any performance gap is attributable to the value of propagating the learned representation.

For 3D reconstruction task, GDumb++ shows a significant advantage over GDumb.

3. Proxy Task for Continual Classification

We utilize the learned feature representation extracted from a continual 3D shape reconstruction and apply nearest-class-mean (NCM) to perform classification. We maintain an exemplar set of 20 images/class with class labels randomly chosen from the training set at each learning exposure. The performance of this proxy task is on par with the SOTA classification algorithm with the same exemplar set size.

Performance of the shape proxy task and classification methods.

4. YASS—Simple Baseline for Classification

We introduce YASS, a class incremental learning method that is a simple extension of a batch learner with standard SGD and data balancing technique. We demonstrate that this simple design choice surprisingly achieves SOTA performance.

5. DyRT—A Novel Tool To Quantify Forgetting

We introduce DyRT (Dynamic Representation Tracking), a novel tool to quantify the dynamics of forgetting in the visual feature representation during continual learning. With DyRT, we demonstrate that the FC layer is more prone to forgetting than the feature extractor, confirming the findings from prior works.

Citation

Bibliography information of this work:

Thai, A., Stojanov, S., Rehg, I., Rehg, J. (2020). Does Continual Learning = Catastrophic Forgetting?

Reference

  1. Ameya Prabhu, Philip Torr, and Puneet Dokania. Gdumb: A simple approach that questions our progress in continuallearning. In ECCV, 2020.