Search Results for author: Adrian Riekert

Found 12 papers, 0 papers with code

Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks

no code implementations • 7 Feb 2024 • Arnulf Jentzen, Adrian Riekert

In this work we solve this research problem in the situation of shallow ANNs with the rectified linear unit (ReLU) and related activations with the standard mean square error loss by disproving in the training of such ANNs that SGD methods (such as the plain vanilla SGD, the momentum SGD, the AdaGrad, the RMSprop, and the Adam optimizers) can find a global minimizer with high probability.

Paper
Add Code

Deep neural network approximation of composite functions without the curse of dimensionality

no code implementations • 12 Apr 2023 • Adrian Riekert

In this article we identify a general class of high-dimensional continuous functions that can be approximated by deep neural networks (DNNs) with the rectified linear unit (ReLU) activation without the curse of dimensionality.

Paper
Add Code

Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equations

no code implementations • 7 Feb 2023 • Arnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger

The obtained ANN architectures and their initialization schemes are thus strongly inspired by numerical algorithms as well as by popular deep learning methodologies from the literature and in that sense we refer to the introduced ANNs in conjunction with their tailor-made initialization schemes as Algorithmically Designed Artificial Neural Networks (ADANNs).

Operator learning

Paper
Add Code

Normalized gradient flow optimization in the training of ReLU artificial neural networks

no code implementations • 13 Jul 2022 • Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg Weiss

The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry.

Paper
Add Code

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

no code implementations • 17 Dec 2021 • Arnulf Jentzen, Adrian Riekert

In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learning problem is piecewise polynomial, under the assumption that the target function (describing the relationship between input data and the output data) is piecewise polynomial, and under the assumption that the risk function of the considered supervised learning problem admits at least one regular global minimum.

Paper
Add Code

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

no code implementations • 13 Dec 2021 • Martin Hutzenthaler, Arnulf Jentzen, Katharina Pohl, Adrian Riekert, Luca Scarpa

In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs.

Paper
Add Code

Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

no code implementations • 18 Aug 2021 • Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg S. Weiss

In the second main result of this article we prove in the training of such ANNs under the assumption that the target function and the density function of the probability distribution of the input data are piecewise polynomial that every non-divergent GF trajectory converges with an appropriate rate of convergence to a critical point and that the risk of the non-divergent GF trajectory converges with rate 1 to the risk of the critical point.

Paper
Add Code

A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions

no code implementations • 10 Aug 2021 • Arnulf Jentzen, Adrian Riekert

Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains - even in the simplest situation of the plain vanilla GD optimization method with random initializations and ANNs with one hidden layer - an open problem to prove (or disprove) the conjecture that the risk of the GD optimization method converges in the training of such ANNs to zero as the width of the ANNs, the number of independent random initializations, and the number of GD steps increase to infinity.

Paper
Add Code

Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

no code implementations • 9 Jul 2021 • Arnulf Jentzen, Adrian Riekert

Finally, in the special situation where there is only one neuron on the hidden layer (1-dimensional hidden layer) we strengthen the above named result for affine linear target functions by proving that that the risk of every (not necessarily bounded) GF trajectory converges to zero if the initial risk is sufficiently small.

Paper
Add Code

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

no code implementations • 1 Apr 2021 • Arnulf Jentzen, Adrian Riekert

In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation.

Paper
Add Code

A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions

no code implementations • 19 Feb 2021 • Patrick Cheridito, Arnulf Jentzen, Adrian Riekert, Florian Rossmannek

This Lyapunov function is the central tool in our convergence proof of the gradient descent method.

Paper
Add Code

Strong overall error analysis for the training of artificial neural networks via random initializations

no code implementations • 15 Dec 2020 • Arnulf Jentzen, Adrian Riekert

Although deep learning based approximation algorithms have been applied very successfully to numerous problems, at the moment the reasons for their performance are not entirely understood from a mathematical point of view.

Stochastic Optimization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.