no code implementations • 10 Jul 2023 • Kevin Scaman, Mathieu Even, Laurent Massoulié
In this paper, we provide a novel framework for the analysis of generalization error of first-order optimization algorithms for statistical learning when the gradient can only be accessed through partial observations given by an oracle.
no code implementations • 5 Jun 2023 • Batiste Le Bars, Aurélien Bellet, Marc Tommasi, Kevin Scaman, Giovanni Neglia
On the contrary, we show, for convex, strongly convex and non-convex functions, that D-SGD can always recover generalization bounds analogous to those of classical SGD, suggesting that the choice of graph does not matter.
no code implementations • 19 Jan 2023 • David A. R. Robin, Kevin Scaman, Marc Lelarge
In this paper, we present a new strategy to prove the convergence of deep learning architectures to a zero training (or even testing) loss by gradient flow.
1 code implementation • 21 Nov 2022 • Yann Fraboni, Martin Van Waerebeke, Kevin Scaman, Richard Vidal, Laetitia Kameni, Marco Lorenzi
Machine Unlearning (MU) is an increasingly important topic in machine learning safety, aiming at removing the contribution of a given data point from a training procedure.
no code implementations • NeurIPS 2021 • Alain Durmus, Eric Moulines, Alexey Naumov, Sergey Samsonov, Kevin Scaman, Hoi-To Wai
This family of methods arises in many machine learning tasks and is used to obtain approximate solutions of a linear system $\bar{A}\theta = \bar{b}$ for which $\bar{A}$ and $\bar{b}$ can only be accessed through random estimates $\{({\bf A}_n, {\bf b}_n): n \in \mathbb{N}^*\}$.
1 code implementation • 8 Mar 2021 • George Dasoulas, Kevin Scaman, Aladin Virmaux
To address this issue, we derive a theoretical analysis of the Lipschitz continuity of attention modules and introduce LipschitzNorm, a simple and parameter-free normalization for self-attention mechanisms that enforces the model to be Lipschitz continuous.
no code implementations • 17 Feb 2021 • George Dasoulas, Giannis Nikolentzos, Kevin Scaman, Aladin Virmaux, Michalis Vazirgiannis
Machine learning on graph-structured data has attracted high research interest due to the emergence of Graph Neural Networks (GNNs).
no code implementations • 17 Feb 2021 • Avery Ma, Aladin Virmaux, Kevin Scaman, Juwei Lu
Do all adversarial examples have the same consequences?
no code implementations • NeurIPS 2020 • Kevin Scaman, Ludovic Dos Santos, Merwan Barlier, Igor Colin
This novel smoothing method is then used to improve first-order non-smooth optimization (both convex and non-convex) by allowing for a local exploration of the search space.
no code implementations • NeurIPS 2020 • Kevin Scaman, Cedric Malherbe
In the case of sub-Gaussian and centered noise, we prove that, with probability $1-\delta$, the number of iterations to reach a precision $\varepsilon$ for the squared gradient norm is $O(\varepsilon^{-2}\ln(1/\delta))$.
no code implementations • 1 Mar 2020 • George Dasoulas, Giannis Nikolentzos, Kevin Scaman, Aladin Virmaux, Michalis Vazirgiannis
Moreover, on graph classification tasks, we suggest the utilization of the generated structural embeddings for the transformation of an attributed graph structure into a set of augmented node attributes.
no code implementations • 12 Dec 2019 • George Dasoulas, Ludovic Dos Santos, Kevin Scaman, Aladin Virmaux
In this paper, we show that a simple coloring scheme can improve, both theoretically and empirically, the expressive power of Message Passing Neural Networks(MPNNs).
no code implementations • NeurIPS 2019 • Igor Colin, Ludovic Dos Santos, Kevin Scaman
For smooth convex and non-convex objective functions, we provide matching lower and upper complexity bounds and show that a naive pipeline parallelization of Nesterov's accelerated gradient descent is optimal.
no code implementations • NeurIPS 2018 • Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié
Under the global regularity assumption, we provide a simple yet efficient algorithm called distributed randomized smoothing (DRS) based on a local smoothing of the objective function, and show that DRS is within a $d^{1/4}$ multiplicative factor of the optimal convergence rate, where $d$ is the underlying dimension.
Optimization and Control
1 code implementation • NeurIPS 2018 • Kevin Scaman, Aladin Virmaux
First, we show that, even for two layer neural networks, the exact computation of this quantity is NP-hard and state-of-art methods may significantly overestimate it.
1 code implementation • NeurIPS 2018 • Moez Draief, Konstantin Kutzkov, Kevin Scaman, Milan Vojnovic
We present novel graph kernels for graphs with node and edge labels that have ordered neighborhoods, i. e. when neighbor nodes follow an order.
no code implementations • 15 Sep 2017 • Kevin Scaman, Argyris Kalogeratos, Luca Corinzia, Nicolas Vayatis
Information Cascades Model captures dynamical properties of user activity in a social network.
1 code implementation • ICML 2017 • Kevin Scaman, Francis Bach, Sébastien Bubeck, Yin Tat Lee, Laurent Massoulié
For centralized (i. e. master/slave) algorithms, we show that distributing Nesterov's accelerated gradient descent is optimal and achieves a precision $\varepsilon > 0$ in time $O(\sqrt{\kappa_g}(1+\Delta\tau)\ln(1/\varepsilon))$, where $\kappa_g$ is the condition number of the (global) function to optimize, $\Delta$ is the diameter of the network, and $\tau$ (resp.
no code implementations • 26 Feb 2016 • Rémi Lemonnier, Kevin Scaman, Argyris Kalogeratos
In this paper, we present a framework for fitting multivariate Hawkes processes for large-scale problems both in the number of events in the observed history $n$ and the number of event types $d$ (i. e. dimensions).
no code implementations • NeurIPS 2015 • Kevin Scaman, Rémi Lemonnier, Nicolas Vayatis
Using this concept, we prove tight non-asymptotic bounds for the influence of a set of nodes, and we also provide an in-depth analysis of the critical time after which the contagion becomes super-critical.
no code implementations • NeurIPS 2014 • Remi Lemonnier, Kevin Scaman, Nicolas Vayatis
In this paper, we derive theoretical bounds for the long-term influence of a node in an Independent Cascade Model (ICM).