no code implementations • 21 Feb 2024 • Daniel Beaglehole, Peter Súkeník, Marco Mondelli, Mikhail Belkin
In this work, we provide substantial evidence that DNC formation occurs primarily through deep feature learning with the average gradient outer product (AGOP).
no code implementations • 7 Feb 2024 • Kevin Kögler, Alexander Shevchenko, Hamed Hassani, Marco Mondelli
For the prototypical case of the 1-bit compression of sparse Gaussian data, we prove that gradient descent converges to a solution that completely disregards the sparse structure of the input.
no code implementations • 5 Feb 2024 • Simone Bombari, Marco Mondelli
Unveiling the reasons behind the exceptional success of transformers requires a better understanding of why attention layers are suitable for NLP tasks.
no code implementations • 28 Aug 2023 • Yihan Zhang, Hong Chang Ji, Ramji Venkataramanan, Marco Mondelli
Our methodology is general, and opens the way to the precise characterization of spiked matrices and of the corresponding spectral methods in a variety of settings.
no code implementations • 23 May 2023 • Francesco Pedrotti, Jan Maas, Marco Mondelli
Our key technical contribution is to provide convergence guarantees which require to run the forward process only for a fixed finite time $T_1$.
no code implementations • 20 May 2023 • Simone Bombari, Marco Mondelli
Deep learning models can be vulnerable to recovery attacks, raising privacy concerns to users, and widespread algorithms such as empirical risk minimization (ERM) often do not directly enforce safety guarantees.
no code implementations • 7 Feb 2023 • Teng Fu, Yuhao Liu, Jean Barbier, Marco Mondelli, Shansuo Liang, Tianqi Hou
We study the performance of a Bayesian statistician who estimates a rank-one signal corrupted by non-symmetric rotationally invariant noise with a generic distribution of singular values.
1 code implementation • 3 Feb 2023 • Simone Bombari, Shayan Kiyani, Marco Mondelli
However, this "universal" law provides only a necessary condition for robustness, and it is unable to discriminate between models.
no code implementations • 27 Dec 2022 • Alexander Shevchenko, Kevin Kögler, Hamed Hassani, Marco Mondelli
Autoencoders are a popular model in many branches of machine learning and lossy data compression.
1 code implementation • 3 Dec 2022 • Yizhou Xu, Tianqi Hou, Shansuo Liang, Marco Mondelli
We consider the problem of reconstructing the signal and the hidden variables from observations coming from a multi-layer network with rotationally invariant weight matrices.
no code implementations • 21 Nov 2022 • Yihan Zhang, Marco Mondelli, Ramji Venkataramanan
In a mixed generalized linear model, the objective is to learn multiple signals from unlabeled observations: each sample comes from exactly one signal, but it is not known which one.
no code implementations • 8 Nov 2022 • Massimo Fornasier, Timo Klock, Marco Mondelli, Michael Rauchensteiner
Artificial neural networks are functions depending on a finite number of parameters typically encoded as weights and biases.
no code implementations • 13 Oct 2022 • Diyuan Wu, Vyacheslav Kungurtsev, Marco Mondelli
In this paper, we focus on neural networks with two and three layers and provide a rigorous understanding of the properties of the solutions found by SHB: \emph{(i)} stability after dropping out part of the neurons, \emph{(ii)} connectivity along a low-loss path, and \emph{(iii)} convergence to the global optimum.
1 code implementation • 3 Oct 2022 • Jean Barbier, Francesco Camilli, Marco Mondelli, Manuel Saenz
To answer this, we study the paradigmatic spiked matrix model of principal components analysis (PCA), where a rank-one matrix is corrupted by additive noise.
no code implementations • 20 May 2022 • Jean Barbier, Tianqi Hou, Marco Mondelli, Manuel Sáenz
We consider the problem of estimating a rank-1 signal corrupted by structured rotationally invariant noise, and address the following question: how well do inference algorithms perform when the noise statistics is unknown and hence Gaussian noise is assumed?
no code implementations • 20 May 2022 • Simone Bombari, Mohammad Hossein Amani, Marco Mondelli
The Neural Tangent Kernel (NTK) has emerged as a powerful tool to provide memorization, optimization and generalization guarantees in deep neural networks.
no code implementations • 17 May 2022 • Mohammad Hossein Amani, Simone Bombari, Marco Mondelli, Rattana Pukdee, Stefano Rini
In this paper, we study the compression of a target two-layer neural network with N nodes into a compressed network with M<N nodes.
no code implementations • 8 Dec 2021 • Ramji Venkataramanan, Kevin Kögler, Marco Mondelli
We consider the problem of signal estimation in generalized linear models defined via rotationally invariant design matrices.
no code implementations • 3 Nov 2021 • Alexander Shevchenko, Vyacheslav Kungurtsev, Marco Mondelli
Understanding the properties of neural networks trained via stochastic gradient descent (SGD) is at the heart of the theory of deep learning.
no code implementations • NeurIPS 2021 • Marco Mondelli, Ramji Venkataramanan
However, the existing analysis of AMP requires an initialization that is both correlated with the signal and independent of the noise, which is often unrealistic in practice.
1 code implementation • NeurIPS 2021 • Quynh Nguyen, Pierre Brechet, Marco Mondelli
More specifically, we show that: (i) under generic assumptions on the features of intermediate layers, it suffices that the last two hidden layers have order of $\sqrt{N}$ neurons, and (ii) if subsets of features at each layer are linearly separable, then no over-parameterization is needed to show the connectivity.
no code implementations • 24 Dec 2020 • Seyyed Ali Hashemi, Marco Mondelli, Arman Fazeli, Alexander Vardy, John Cioffi, Andrea Goldsmith
In particular, when the number of processing elements $P$ that can perform SSC decoding operations in parallel is limited, as is the case in practice, the latency of SSC decoding is $O\left(N^{1-1/\mu}+\frac{N}{P}\log_2\log_2\frac{N}{P}\right)$, where $N$ is the block length of the code and $\mu$ is the scaling exponent of the channel.
Information Theory Information Theory
no code implementations • 21 Dec 2020 • Quynh Nguyen, Marco Mondelli, Guido Montufar
In this paper, we provide tight bounds on the smallest eigenvalue of NTK matrices for deep ReLU nets, both in the limiting case of infinite widths and for finite widths.
no code implementations • 7 Oct 2020 • Marco Mondelli, Ramji Venkataramanan
We consider the problem of estimating a signal from measurements obtained via a generalized linear model.
no code implementations • 7 Aug 2020 • Marco Mondelli, Christos Thrampoulidis, Ramji Venkataramanan
This allows us to compute the Bayes-optimal combination of $\hat{\boldsymbol x}^{\rm L}$ and $\hat{\boldsymbol x}^{\rm s}$, given the limiting distribution of the signal $\boldsymbol x$.
no code implementations • NeurIPS 2020 • Quynh Nguyen, Marco Mondelli
Recent works have shown that gradient descent can find a global minimum for over-parameterized neural networks where the widths of all the hidden layers scale polynomially with $N$ ($N$ being the number of training samples).
no code implementations • ICML 2020 • Alexander Shevchenko, Marco Mondelli
In this paper, we shed light on this phenomenon: we show that the combination of stochastic gradient descent (SGD) and over-parameterization makes the landscape of multilayer neural networks approximately connected and thus more favorable to optimization.
no code implementations • 5 Jan 2019 • Adel Javanmard, Marco Mondelli, Andrea Montanari
We prove that, in the limit in which the number of neurons diverges, the evolution of gradient descent converges to a Wasserstein gradient flow in the space of probability distributions over $\Omega$.
no code implementations • 20 Feb 2018 • Marco Mondelli, Andrea Montanari
Our conclusion holds for a `natural data distribution', namely standard Gaussian feature vectors $\boldsymbol x$, and output distributed according to a two-layer neural network with random isotropic weights, and under a certain complexity-theoretic assumption on tensor decomposition.
no code implementations • 20 Aug 2017 • Marco Mondelli, Andrea Montanari
In phase retrieval we want to recover an unknown signal $\boldsymbol x\in\mathbb C^d$ from $n$ quadratic measurements of the form $y_i = |\langle{\boldsymbol a}_i,{\boldsymbol x}\rangle|^2+w_i$ where $\boldsymbol a_i\in \mathbb C^d$ are known sensing vectors and $w_i$ is measurement noise.