no code implementations • 26 Mar 2024 • Victor Leger, Romain Couillet
This article considers a semi-supervised classification setting on a Gaussian mixture model, where the data is not labeled strictly as usual, but instead with uncertain labels.
no code implementations • 21 Feb 2024 • Victor Leger, Romain Couillet
This article conducts a large dimensional study of a simple yet quite versatile classification model, encompassing at once multi-task and semi-supervised learning, and taking into account uncertain labeling.
no code implementations • 19 Feb 2024 • Hugo Lebeau, Florent Chatelain, Romain Couillet
The performance of spectral clustering relies on the fluctuations of the entries of the eigenvectors of a similarity matrix, which has been left uncharacterized until now.
no code implementations • 5 Feb 2024 • Hugo Lebeau, Florent Chatelain, Romain Couillet
This work presents a comprehensive understanding of the estimation of a planted low-rank signal from a general spiked tensor model near the computational threshold.
1 code implementation • 3 Mar 2023 • Minh-Toan Nguyen, Romain Couillet
In the supervised case, we derive a simple algorithm that attains the Bayes optimal performance.
no code implementations • 23 Dec 2021 • Mohamed El Amine Seddik, Maxime Guillaud, Romain Couillet
Relying on random matrix theory (RMT), this paper studies asymmetric order-$d$ spiked tensor models with Gaussian noise.
no code implementations • 1 Nov 2021 • Malik Tiomoko, Romain Couillet, Frédéric Pascal
The article proposes and theoretically analyses a \emph{computationally efficient} multi-task learning (MTL) extension of popular principal component analysis (PCA)-based supervised learning schemes \cite{barshan2011supervised, bair2006prediction}.
1 code implementation • 9 Oct 2021 • Sami Fakhry, Romain Couillet, Malik Tiomoko
This article proposes a distributed multi-task learning (MTL) algorithm based on supervised principal component analysis (SPCA) which is: (i) theoretically optimal for Gaussian mixtures, (ii) computationally cheap and scalable.
2 code implementations • ICLR 2022 • Hafiz Tiomoko Ali, Zhenyu Liao, Romain Couillet
As a result, for any kernel matrix ${\bf K}$ of the form above, we propose a novel random features technique, called Ternary Random Feature (TRF), that (i) asymptotically yields the same limiting kernel as the original ${\bf K}$ in a spectral sense and (ii) can be computed and stored much more efficiently, by wisely tuning (in a data-dependent manner) the function $\sigma$ and the random vector ${\bf w}$, both taking values in $\{-1, 0, 1\}$.
no code implementations • 6 Sep 2021 • Cosme Louart, Romain Couillet
Given a random matrix $X= (x_1,\ldots, x_n)\in \mathcal M_{p, n}$ with independent columns and satisfying concentration of measure hypotheses and a parameter $z$ whose distance to the spectrum of $\frac{1}{n} XX^T$ should not depend on $p, n$, it was previously shown that the functionals $\text{tr}(AR(z))$, for $R(z) = (\frac{1}{n}XX^T- zI_p)^{-1}$ and $A\in \mathcal M_{p}$ deterministic, have a standard deviation of order $O(\|A\|_* / \sqrt n)$.
no code implementations • 2 Aug 2021 • José Henrique de Morais Goulart, Romain Couillet, Pierre Comon
A numerical verification provides evidence that the same holds for orders 4 and 5, leading us to conjecture that, for any order, our fixed-point equation is equivalent to the known characterization of the ML estimation performance that had been obtained by relying on spin glasses.
1 code implementation • 5 Mar 2021 • Lorenzo Dall'Amico, Romain Couillet, Nicolas Tremblay
This article unveils a new relation between the Nishimori temperature parametrizing a distribution P and the Bethe free energy on random Erdos-Renyi graphs with edge weights distributed according to P. Estimating the Nishimori temperature being a task of major importance in Bayesian inference problems, as a practical corollary of this new relation, a numerical method is proposed to accurately estimate the Nishimori temperature from the eigenvalues of the Bethe Hessian matrix of the weighted graph.
1 code implementation • 24 Feb 2021 • Romain Couillet, Florent Chatelain, Nicolas Le Bihan
The article introduces an elementary cost and storage reduction method for spectral clustering and principal component analysis.
no code implementations • 16 Feb 2021 • Cosme Louart, Romain Couillet
Starting from concentration of measure hypotheses on $m$ random vectors $Z_1,\ldots, Z_m$, this article provides an expression of the concentration of functionals $\phi(Z_1,\ldots, Z_m)$ where the variations of $\phi$ on each variable depend on the product of the norms (or semi-norms) of the other variables (as if $\phi$ were a product).
no code implementations • ICLR 2021 • Malik Tiomoko, Hafiz Tiomoko Ali, Romain Couillet
This article provides theoretical insights into the inner workings of multi-task and transfer learning methods, by studying the tractable least-square support vector machine multi-task learning (LS-SVM MTL) method, in the limit of large ($p$) and numerous ($n$) data.
1 code implementation • CONLL 2020 • Romain Couillet, Yagmur Gizem Cinar, Eric Gaussier, Muhammad Imran
This article establishes that, unlike the legacy tf*idf representation, recent natural language representations (word embedding vectors) tend to exhibit a so-called \textit{concentration of measure phenomenon}, in the sense that, as the representation size $p$ and database size $n$ are both large, their behavior is similar to that of large dimensional Gaussian random vectors.
no code implementations • ICLR 2021 • Zhenyu Liao, Romain Couillet, Michael W. Mahoney
Given a large data matrix, sparsifying, quantizing, and/or performing other entry-wise nonlinear operations can have numerous benefits, ranging from speeding up iterative algorithms for core numerical linear algebra problems to providing nonlinear filters to design state-of-the-art neural network models.
no code implementations • 3 Sep 2020 • Malik Tiomoko, Romain Couillet, Hafiz Tiomoko
Multi Task Learning (MTL) efficiently leverages useful information contained in multiple related tasks to help improve the generalization performance of all tasks.
no code implementations • 17 Jun 2020 • Cosme Louart, Romain Couillet
This article studies the \emph{robust covariance matrix estimation} of a data collection $X = (x_1,\ldots, x_n)$ with $x_i = \sqrt \tau_i z_i + m$, where $z_i \in \mathbb R^p$ is a \textit{concentrated vector} (e. g., an elliptical random vector), $m\in \mathbb R^p$ a deterministic signal and $\tau_i\in \mathbb R$ a scalar perturbation of possibly large amplitude, under the assumption where both $n$ and $p$ are large.
no code implementations • 13 Jun 2020 • Xiaoyi Mai, Romain Couillet
Semi-supervised Laplacian regularization, a standard graph-based approach for learning from both labelled and unlabelled data, was recently demonstrated to have an insignificant high dimensional learning efficiency with respect to unlabelled data (Mai and Couillet 2018), causing it to be outperformed by its unsupervised counterpart, spectral clustering, given sufficient unlabelled data.
no code implementations • NeurIPS 2020 • Zhenyu Liao, Romain Couillet, Michael W. Mahoney
This article characterizes the exact asymptotics of random Fourier feature (RFF) regression, in the realistic setting where the number of data samples $n$, their dimension $p$, and the dimension of feature space $N$ are all large and comparable.
1 code implementation • NeurIPS 2020 • Lorenzo Dall'Amico, Romain Couillet, Nicolas Tremblay
This article considers the problem of community detection in sparse dynamical graphs in which the community structure evolves over time.
1 code implementation • 20 Mar 2020 • Lorenzo Dall'Amico, Romain Couillet, Nicolas Tremblay
This article considers spectral community detection in the regime of sparse networks with heterogeneous degree distributions, for which we devise an algorithm to efficiently retrieve communities.
no code implementations • ICML 2020 • Mohamed El Amine Seddik, Cosme Louart, Mohamed Tamaazousti, Romain Couillet
This paper shows that deep learning (DL) representations of data produced by generative adversarial nets (GANs) are random vectors which fall within the class of so-called \textit{concentrated} random vectors.
no code implementations • 3 Dec 2019 • Lorenzo Dall'Amico, Romain Couillet, Nicolas Tremblay
Regularization of the classical Laplacian matrices was empirically shown to improve spectral clustering in sparse networks.
no code implementations • 15 Sep 2019 • Zhenyu Liao, Romain Couillet
This article investigates the eigenspectrum of the inner product-type kernel matrix $\sqrt{p} \mathbf{K}=\{f( \mathbf{x}_i^{\sf T} \mathbf{x}_j/\sqrt{p})\}_{i, j=1}^n $ under a binary mixture model in the high dimensional regime where the number of data $n$ and their dimension $p$ are both large and comparable.
no code implementations • ICLR 2019 • Mohamed El Amine Seddik, Mohamed Tamaazousti, Romain Couillet
In this paper, we present a random matrix approach to recover sparse principal components from n p-dimensional vectors.
1 code implementation • 8 Mar 2019 • Malik Tiomoko, Romain Couillet
This article proposes a method to consistently estimate functionals $\frac1p\sum_{i=1}^pf(\lambda_i(C_1C_2))$ of the eigenvalues of the product of two covariance matrices $C_1, C_2\in\mathbb{R}^{p\times p}$ based on the empirical estimates $\lambda_i(\hat C_1\hat C_2)$ ($\hat C_a=\frac1{n_a}\sum_{i=1}^{n_a} x_i^{(a)}x_i^{(a){{\sf T}}}$), when the size $p$ and number $n_a$ of the (zero mean) samples $x_i^{(a)}$ are similar.
no code implementations • 7 Feb 2019 • Malik Tiomoko, Florent Bouchard, Guillaume Ginholac, Romain Couillet
Relying on recent advances in statistical estimation of covariance distances based on random matrix theory, this article proposes an improved covariance and precision matrix estimation for a wide family of metrics.
no code implementations • NeurIPS 2019 • Lorenzo Dall'Amico, Romain Couillet, Nicolas Tremblay
Spectral clustering is one of the most popular, yet still incompletely understood, methods for community detection on graphs.
no code implementations • 8 Nov 2018 • Yacine Chitour, Zhenyu Liao, Romain Couillet
We translate a well-known empirical observation of linear neural nets into a conjecture that we call the \emph{overfitting conjecture} which states that, for almost all training data and initial conditions, the trajectory of the corresponding gradient descent system converges to a global minimum.
no code implementations • 10 Oct 2018 • Romain Couillet, Malik Tiomoko, Steeve Zozor, Eric Moisan
Given two sets $x_1^{(1)},\ldots, x_{n_1}^{(1)}$ and $x_1^{(2)},\ldots, x_{n_2}^{(2)}\in\mathbb{R}^p$ (or $\mathbb{C}^p$) of random vectors with zero mean and positive definite covariance matrices $C_1$ and $C_2\in\mathbb{R}^{p\times p}$ (or $\mathbb{C}^{p\times p}$), respectively, this article provides novel estimators for a wide range of distances between $C_1$ and $C_2$ (along with divergences between some zero mean and covariance $C_1$ or $C_2$ probability measures) of the form $\frac1p\sum_{i=1}^n f(\lambda_i(C_1^{-1}C_2))$ (with $\lambda_i(X)$ the eigenvalues of matrix $X$).
no code implementations • 16 Jun 2018 • Hafiz Tiomoko Ali, Sijia Liu, Yasin Yilmaz, Romain Couillet, Indika Rajapakse, Alfred Hero
We propose a method for simultaneously detecting shared and unshared communities in heterogeneous multilayer weighted and undirected networks.
1 code implementation • ICML 2018 • Zhenyu Liao, Romain Couillet
Random feature maps are ubiquitous in modern statistical machine learning, where they generalize random projections by means of powerful, yet often difficult to analyze nonlinear operators.
BIG-bench Machine Learning Vocal Bursts Intensity Prediction
no code implementations • ICML 2018 • Zhenyu Liao, Romain Couillet
Understanding the learning dynamics of neural networks is one of the key issues for the improvement of optimization algorithms as well as for the theoretical comprehension of why deep neural nets work so well today.
no code implementations • 9 Nov 2017 • Xiaoyi Mai, Romain Couillet
This article provides an original understanding of the behavior of a class of graph-oriented semi-supervised learning algorithms in the limit of large and numerous data.
1 code implementation • 1 Nov 2017 • Khalil Elkhalil, Abla Kammoun, Romain Couillet, Tareq Y. Al-Naffouri, Mohamed-Slim Alouini
This article carries out a large dimensional analysis of standard regularized discriminant analysis classifiers designed on the assumption that data arise from a Gaussian mixture model with different means and covariances.
1 code implementation • 17 Feb 2017 • Cosme Louart, Zhenyu Liao, Romain Couillet
This article studies the Gram random matrix model $G=\frac1T\Sigma^{\rm T}\Sigma$, $\Sigma=\sigma(WX)$, classically found in the analysis of random feature maps and random neural networks, where $X=[x_1,\ldots, x_T]\in{\mathbb R}^{p\times T}$ is a (data) matrix of bounded norm, $W\in{\mathbb R}^{n\times p}$ is a matrix of independent zero-mean unit variance entries, and $\sigma:{\mathbb R}\to{\mathbb R}$ is a Lipschitz continuous (activation) function --- $\sigma(WX)$ being understood entry-wise.
1 code implementation • 11 Jan 2017 • Zhenyu Liao, Romain Couillet
In this article, a large dimensional performance analysis of kernel least squares support vector machines (LS-SVMs) is provided under the assumption of a two-class Gaussian mixture model for the input data.
no code implementations • 3 Nov 2016 • Hafiz Tiomoko Ali, Romain Couillet
The analysis of this equivalent spiked random matrix allows us to improve spectral methods for community detection and assess their performances in the regime under study.
no code implementations • 7 Sep 2016 • Zhenyu Liao, Romain Couillet
This article proposes a performance analysis of kernel least squares support vector machines (LS-SVMs) based on a random matrix approach, in the regime where both the dimension of data $p$ and their number $n$ grow large at the same rate.
no code implementations • 25 Mar 2016 • Romain Couillet, Gilles Wainrib, Harry Sevi, Hafiz Tiomoko Ali
In this article, a study of the mean-square error (MSE) performance of linear echo-state neural networks is performed, both for training and testing tasks.
no code implementations • 4 Mar 2015 • David Morales-Jimenez, Romain Couillet, Matthew R. McKay
A large dimensional characterization of robust M-estimators of covariance (or scatter) is provided under the assumption that the dataset comprises independent (essentially Gaussian) legitimate samples as well as arbitrary deterministic samples, referred to as outliers.