no code implementations • 7 Feb 2024 • Arnulf Jentzen, Adrian Riekert
In this work we solve this research problem in the situation of shallow ANNs with the rectified linear unit (ReLU) and related activations with the standard mean square error loss by disproving in the training of such ANNs that SGD methods (such as the plain vanilla SGD, the momentum SGD, the AdaGrad, the RMSprop, and the Adam optimizers) can find a global minimizer with high probability.
no code implementations • 12 Apr 2023 • Adrian Riekert
In this article we identify a general class of high-dimensional continuous functions that can be approximated by deep neural networks (DNNs) with the rectified linear unit (ReLU) activation without the curse of dimensionality.
no code implementations • 7 Feb 2023 • Arnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger
The obtained ANN architectures and their initialization schemes are thus strongly inspired by numerical algorithms as well as by popular deep learning methodologies from the literature and in that sense we refer to the introduced ANNs in conjunction with their tailor-made initialization schemes as Algorithmically Designed Artificial Neural Networks (ADANNs).
no code implementations • 13 Jul 2022 • Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg Weiss
The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry.
no code implementations • 17 Dec 2021 • Arnulf Jentzen, Adrian Riekert
In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learning problem is piecewise polynomial, under the assumption that the target function (describing the relationship between input data and the output data) is piecewise polynomial, and under the assumption that the risk function of the considered supervised learning problem admits at least one regular global minimum.
no code implementations • 13 Dec 2021 • Martin Hutzenthaler, Arnulf Jentzen, Katharina Pohl, Adrian Riekert, Luca Scarpa
In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs.
no code implementations • 18 Aug 2021 • Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg S. Weiss
In the second main result of this article we prove in the training of such ANNs under the assumption that the target function and the density function of the probability distribution of the input data are piecewise polynomial that every non-divergent GF trajectory converges with an appropriate rate of convergence to a critical point and that the risk of the non-divergent GF trajectory converges with rate 1 to the risk of the critical point.
no code implementations • 10 Aug 2021 • Arnulf Jentzen, Adrian Riekert
Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains - even in the simplest situation of the plain vanilla GD optimization method with random initializations and ANNs with one hidden layer - an open problem to prove (or disprove) the conjecture that the risk of the GD optimization method converges in the training of such ANNs to zero as the width of the ANNs, the number of independent random initializations, and the number of GD steps increase to infinity.
no code implementations • 9 Jul 2021 • Arnulf Jentzen, Adrian Riekert
Finally, in the special situation where there is only one neuron on the hidden layer (1-dimensional hidden layer) we strengthen the above named result for affine linear target functions by proving that that the risk of every (not necessarily bounded) GF trajectory converges to zero if the initial risk is sufficiently small.
no code implementations • 1 Apr 2021 • Arnulf Jentzen, Adrian Riekert
In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation.
no code implementations • 19 Feb 2021 • Patrick Cheridito, Arnulf Jentzen, Adrian Riekert, Florian Rossmannek
This Lyapunov function is the central tool in our convergence proof of the gradient descent method.
no code implementations • 15 Dec 2020 • Arnulf Jentzen, Adrian Riekert
Although deep learning based approximation algorithms have been applied very successfully to numerous problems, at the moment the reasons for their performance are not entirely understood from a mathematical point of view.