This opportunity is not published. No applications will be accepted.

Continual Learning and Neural Networks’ Scaling Limit(s)

In this project, we aim to study the effect of the network’s architecture in continual learning, with a specific focus on the effect of scaling it to large width and depth, and their interplay with other architectural components such as residual connections.

Keywords: continual learning, network scaling limits, kernels

Description
Continual learning is a machine learning paradigm that focuses on learning a set of tasks in a sequential fashion. The ideal objective is to learn new tasks flexibly (plasticity) without forgetting what has been learned (stability). When it comes to neural networks, it is well-established that they suffer from catastrophic forgetting, as they fail to retain knowledge even with very few tasks. Despite several methods being developed to counteract it, finding a good trade-off between stability and plasticity is still a largely open problem in continual learning [Wang et al., 2023] Why Scaling Limits? It turns out that there are several ways to scale-up the architecture, resulting in different limits with different properties, especially when it comes to feature learning. For instance, when the network’s with N is taken to infinity, the network enters the so-called kernel regime, where the predictions can be described in closed form with the machinery of kernels and Gaussian processes [Neal, 1996, Lee et al., 2017, Jacot et al., 2018]. Crucial for this project, in the limit there is no feature learning! Without feature learning, the network is supposed to forget less catastrophically (at the loss of reduced plasticity?), as shown in Mirzadeh et al. [2022]. Alternatively, if the architecture and learning rate are parametrized slightly differently, the network learns feature at every layer as N → ∞ [Yang and Hu, 2020, Bordelon and Pehlevan, 2022]. Recently, there has been extensions of these limits to infinite depth L → ∞ as well! [Bordelon et al., 2023]. In practice, the rate of feature learning can be easily controlled through a hyperparameter (as you will learn during the project)
Continual learning is a machine learning paradigm that focuses on learning a set of tasks in a sequential fashion. The ideal objective is to learn new tasks flexibly (plasticity) without forgetting what has been learned (stability). When it comes to neural networks, it is well-established that they suffer from catastrophic forgetting, as they fail to retain knowledge even with very few tasks. Despite several methods being developed to counteract it, finding a good trade-off between stability and plasticity is still a largely open problem in continual learning [Wang et al., 2023]

Why Scaling Limits? It turns out that there are several ways to scale-up the architecture, resulting in different limits with different properties, especially when it comes to feature learning. For instance, when the network’s with N is taken to infinity, the network enters the so-called kernel regime, where the predictions can be described in closed form with the machinery of kernels and Gaussian processes [Neal, 1996, Lee et al., 2017, Jacot et al., 2018]. Crucial for this project, in the limit there is no feature learning! Without feature learning, the network is supposed to forget less catastrophically (at the loss of reduced plasticity?), as shown in Mirzadeh et al. [2022].
Alternatively, if the architecture and learning rate are parametrized slightly differently, the network learns feature at every layer as N → ∞ [Yang and Hu, 2020, Bordelon and Pehlevan, 2022]. Recently, there has been extensions of these limits to infinite depth L → ∞ as well! [Bordelon et al., 2023]. In practice, the rate of feature learning can be easily controlled through a hyperparameter (as you will learn during the project)
Goal
The existence of different limits poses questions such as which limit is best for continual learning? What is the role of feature learning? Can we have consistent improvement as we scale up the architecture, while making the most efficient use of the parameters?
The existence of different limits poses questions such as which limit is best for continual learning? What is the role of feature learning? Can we have consistent improvement as we scale up the architecture, while making the most efficient use of the parameters?
Contact Details
lorenzo.noci@inf.ethz.ch glanzillo@ethz.ch
lorenzo.noci@inf.ethz.ch
glanzillo@ethz.ch

Calendar

Earliest start	2023-12-05
Latest end	No date

Location

ETH Competence Center - ETH AI Center (ETHZ)

Labels

Semester Project
Master Thesis

Topics

Information, Computing and Communication Sciences

Documents

Name	Comment	Size	Actions
thesis_continual_learning_and_scaling_limits.pdf		179KB	Download