DASH: Warm-Starting Neural Network Training Without Loss of Plasticity Under Stationarity

Abstract

Warm-starting neural networks by initializing them with previously learned weights is appealing, as practical neural networks are often deployed under a continuous influx of new data. However, this often leads to a phenomenon known as loss of plasticity, where the network loses its ability to learn new information and thereby shows worse generalization performance than those trained from scratch. While this issue has been actively studied in non-stationary data distributions, it surprisingly occurs even when the data distribution remains stationary, and its underlying mechanism is poorly understood. To address this gap, we develop a framework emulating real-world neural network training. Under this framework, we identify noise memorization as the primary cause of plasticity loss when warm-starting networks on stationary data. Motivated by this discovery, we propose Direction-Aware SHrinking (DASH), an effective method aiming to mitigate plasticity loss by selectively forgetting memorized noise while preserving learned features. We validate our approach on vision classification task, demonstrating consistent improvements in test accuracy and training efficiency.

Publication
Neural Information Processing Systems 2024
Chulhee Yun
Chulhee Yun
Assistant Professor

I am an assistant professor at KAIST AI. I am interested in optimization and machine learning theory.