Search Results for author: Francesca Mignacco

Found 9 papers, 3 papers with code

Dissecting the Interplay of Attention Paths in a Statistical Mechanics Theory of Transformers

1 code implementation24 May 2024 Lorenzo Tiberi, Francesca Mignacco, Kazuki Irie, Haim Sompolinsky

Our theory shows that the predictor statistics are expressed as a sum of independent kernels, each one pairing different 'attention paths', defined as information pathways through different attention heads across layers.

Nonlinear classification of neural manifolds with contextual information

no code implementations10 May 2024 Francesca Mignacco, Chi-Ning Chou, SueYeon Chung

Understanding how neural systems efficiently process information through distributed representations is a fundamental challenge at the interface of neuroscience and machine learning.

Classification

Forward Learning with Top-Down Feedback: Empirical and Analytical Characterization

no code implementations10 Feb 2023 Ravi Srinivasan, Francesca Mignacco, Martino Sorbaro, Maria Refinetti, Avi Cooper, Gabriel Kreiman, Giorgia Dellaferrera

"Forward-only" algorithms, which train neural networks while avoiding a backward pass, have recently gained attention as a way of solving the biologically unrealistic aspects of backpropagation.

Rigorous dynamical mean field theory for stochastic gradient descent methods

1 code implementation12 Oct 2022 Cedric Gerbelot, Emanuele Troiani, Francesca Mignacco, Florent Krzakala, Lenka Zdeborova

We prove closed-form equations for the exact high-dimensional asymptotics of a family of first order gradient-based methods, learning an estimator (e. g. M-estimator, shallow neural network, ...) from observations on Gaussian data with empirical risk minimization.

Learning curves for the multi-class teacher-student perceptron

1 code implementation22 Mar 2022 Elisabetta Cornacchia, Francesca Mignacco, Rodrigo Veiga, Cédric Gerbelot, Bruno Loureiro, Lenka Zdeborová

For Gaussian teacher weights, we investigate the performance of ERM with both cross-entropy and square losses, and explore the role of ridge regularisation in approaching Bayes-optimality.

Binary Classification Learning Theory +1

The effective noise of Stochastic Gradient Descent

no code implementations20 Dec 2021 Francesca Mignacco, Pierfrancesco Urbani

In the under-parametrized regime, where the final training error is positive, the SGD dynamics reaches a stationary state and we define an effective temperature from the fluctuation-dissipation theorem, computed from dynamical mean-field theory.

Stochasticity helps to navigate rough landscapes: comparing gradient-descent-based algorithms in the phase retrieval problem

no code implementations8 Mar 2021 Francesca Mignacco, Pierfrancesco Urbani, Lenka Zdeborová

In this paper we investigate how gradient-based algorithms such as gradient descent, (multi-pass) stochastic gradient descent, its persistent variant, and the Langevin algorithm navigate non-convex loss-landscapes and which of them is able to reach the best generalization error at limited sample complexity.

Navigate Retrieval

Cannot find the paper you are looking for? You can Submit a new open access paper.