Search Results for author: Arnulf Jentzen

Found 45 papers, 5 papers with code

Non-convergence to global minimizers for Adam and stochastic gradient descent optimization and constructions of local minimizers in the training of artificial neural networks

no code implementations • 7 Feb 2024 • Arnulf Jentzen, Adrian Riekert

In this work we solve this research problem in the situation of shallow ANNs with the rectified linear unit (ReLU) and related activations with the standard mean square error loss by disproving in the training of such ANNs that SGD methods (such as the plain vanilla SGD, the momentum SGD, the AdaGrad, the RMSprop, and the Adam optimizers) can find a global minimizer with high probability.

Paper
Add Code

Mathematical Introduction to Deep Learning: Methods, Implementations, and Theory

1 code implementation • 31 Oct 2023 • Arnulf Jentzen, Benno Kuckuck, Philippe von Wurstemberger

This book aims to provide an introduction to the topic of deep learning algorithms.

100

Paper
Code

Deep neural networks with ReLU, leaky ReLU, and softplus activation provably overcome the curse of dimensionality for Kolmogorov partial differential equations with Lipschitz nonlinearities in the $L^p$-sense

no code implementations • 24 Sep 2023 • Julia Ackermann, Arnulf Jentzen, Thomas Kruse, Benno Kuckuck, Joshua Lee Padgett

Recently, several deep learning (DL) methods for approximating high-dimensional partial differential equations (PDEs) have been proposed.

Paper
Add Code

On the existence of minimizers in shallow residual ReLU neural network optimization landscapes

no code implementations • 28 Feb 2023 • Steffen Dereich, Arnulf Jentzen, Sebastian Kassing

Many mathematical convergence results for gradient descent (GD) based algorithms employ the assumption that the GD process is (almost surely) bounded and, also in concrete numerical simulations, divergence of the GD process may slow down, or even completely rule out, convergence of the error function.

Paper
Add Code

Algorithmically Designed Artificial Neural Networks (ADANNs): Higher order deep operator learning for parametric partial differential equations

no code implementations • 7 Feb 2023 • Arnulf Jentzen, Adrian Riekert, Philippe von Wurstemberger

The obtained ANN architectures and their initialization schemes are thus strongly inspired by numerical algorithms as well as by popular deep learning methodologies from the literature and in that sense we refer to the introduced ANNs in conjunction with their tailor-made initialization schemes as Algorithmically Designed Artificial Neural Networks (ADANNs).

Operator learning

Paper
Add Code

The necessity of depth for artificial neural networks to approximate certain classes of smooth and bounded functions without the curse of dimensionality

no code implementations • 19 Jan 2023 • Lukas Gonon, Robin Graeber, Arnulf Jentzen

In particular, it is a key contribution of this work to reveal that for all $a, b\in\mathbb{R}$ with $b-a\geq 7$ we have that the functions $[a, b]^d\ni x=(x_1,\dots, x_d)\mapsto\prod_{i=1}^d x_i\in\mathbb{R}$ for $d\in\mathbb{N}$ as well as the functions $[a, b]^d\ni x =(x_1,\dots, x_d)\mapsto\sin(\prod_{i=1}^d x_i) \in \mathbb{R} $ for $ d \in \mathbb{N} $ can neither be approximated without the curse of dimensionality by means of shallow ANNs nor insufficiently deep ANNs with ReLU activation but can be approximated without the curse of dimensionality by sufficiently deep ANNs with ReLU activation.

Paper
Add Code

Gradient descent provably escapes saddle points in the training of shallow ReLU networks

no code implementations • 3 Aug 2022 • Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

Dynamical systems theory has recently been applied in optimization to prove that gradient descent algorithms avoid so-called strict saddle points of the loss function.

Paper
Add Code

Normalized gradient flow optimization in the training of ReLU artificial neural networks

no code implementations • 13 Jul 2022 • Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg Weiss

The training of artificial neural networks (ANNs) is nowadays a highly relevant algorithmic procedure with many applications in science and industry.

Paper
Add Code

On bounds for norms of reparameterized ReLU artificial neural network parameters: sums of fractional powers of the Lipschitz norm control the network parameter vector

no code implementations • 27 Jun 2022 • Arnulf Jentzen, Timo Kröger

Furthermore, we prove that this upper bound only holds for sums of powers of the Lipschitz norm with the exponents $ 1/2 $ and $ 1 $ but does not hold for the Lipschitz norm alone.

Paper
Add Code

Deep learning approximations for non-local nonlinear PDEs with Neumann boundary conditions

no code implementations • 7 May 2022 • Victor Boussange, Sebastian Becker, Arnulf Jentzen, Benno Kuckuck, Loïc Pellissier

We evaluate the performance of the two methods on five different PDEs arising in physics and biology.

Paper
Add Code

On the existence of global minima and convergence analyses for gradient descent methods in the training of deep neural networks

no code implementations • 17 Dec 2021 • Arnulf Jentzen, Adrian Riekert

In this article we study fully-connected feedforward deep ReLU ANNs with an arbitrarily large number of hidden layers and we prove convergence of the risk of the GD optimization method with random initializations in the training of such ANNs under the assumption that the unnormalized probability density function of the probability distribution of the input data of the considered supervised learning problem is piecewise polynomial, under the assumption that the target function (describing the relationship between input data and the output data) is piecewise polynomial, and under the assumption that the risk function of the considered supervised learning problem admits at least one regular global minimum.

Paper
Add Code

Convergence proof for stochastic gradient descent in the training of deep neural networks with ReLU activation for constant target functions

no code implementations • 13 Dec 2021 • Martin Hutzenthaler, Arnulf Jentzen, Katharina Pohl, Adrian Riekert, Luca Scarpa

In many numerical simulations stochastic gradient descent (SGD) type optimization methods perform very effectively in the training of deep neural networks (DNNs) but till this day it remains an open problem of research to provide a mathematical convergence analysis which rigorously explains the success of SGD type optimization methods in the training of DNNs.

Paper
Add Code

Existence, uniqueness, and convergence rates for gradient flows in the training of artificial neural networks with ReLU activation

no code implementations • 18 Aug 2021 • Simon Eberle, Arnulf Jentzen, Adrian Riekert, Georg S. Weiss

In the second main result of this article we prove in the training of such ANNs under the assumption that the target function and the density function of the probability distribution of the input data are piecewise polynomial that every non-divergent GF trajectory converges with an appropriate rate of convergence to a critical point and that the risk of the non-divergent GF trajectory converges with rate 1 to the risk of the critical point.

Paper
Add Code

A proof of convergence for the gradient descent optimization method with random initializations in the training of neural networks with ReLU activation for piecewise linear target functions

no code implementations • 10 Aug 2021 • Arnulf Jentzen, Adrian Riekert

Despite the great success of GD type optimization methods in numerical simulations for the training of ANNs with ReLU activation, it remains - even in the simplest situation of the plain vanilla GD optimization method with random initializations and ANNs with one hidden layer - an open problem to prove (or disprove) the conjecture that the risk of the GD optimization method converges in the training of such ANNs to zero as the width of the ANNs, the number of independent random initializations, and the number of GD steps increase to infinity.

Paper
Add Code

Convergence analysis for gradient flows in the training of artificial neural networks with ReLU activation

no code implementations • 9 Jul 2021 • Arnulf Jentzen, Adrian Riekert

Finally, in the special situation where there is only one neuron on the hidden layer (1-dimensional hidden layer) we strengthen the above named result for affine linear target functions by proving that that the risk of every (not necessarily bounded) GF trajectory converges to zero if the initial risk is sufficiently small.

Paper
Add Code

A proof of convergence for stochastic gradient descent in the training of artificial neural networks with ReLU activation for constant target functions

no code implementations • 1 Apr 2021 • Arnulf Jentzen, Adrian Riekert

In this article we study the stochastic gradient descent (SGD) optimization method in the training of fully-connected feedforward artificial neural networks with ReLU activation.

Paper
Add Code

Landscape analysis for shallow neural networks: complete classification of critical points for affine target functions

no code implementations • 19 Mar 2021 • Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

In this paper, we analyze the landscape of the true loss of neural networks with one hidden layer and ReLU, leaky ReLU, or quadratic activation.

Paper
Add Code

Convergence rates for gradient descent in the training of overparameterized artificial neural networks with biases

no code implementations • 23 Feb 2021 • Arnulf Jentzen, Timo Kröger

In recent years, artificial neural networks have developed into a powerful tool for dealing with a multitude of problems for which classical solution approaches reach their limits.

Paper
Add Code

A proof of convergence for gradient descent in the training of artificial neural networks for constant target functions

no code implementations • 19 Feb 2021 • Patrick Cheridito, Arnulf Jentzen, Adrian Riekert, Florian Rossmannek

This Lyapunov function is the central tool in our convergence proof of the gradient descent method.

Paper
Add Code

An overview on deep learning-based approximation methods for partial differential equations

no code implementations • 22 Dec 2020 • Christian Beck, Martin Hutzenthaler, Arnulf Jentzen, Benno Kuckuck

It is one of the most challenging problems in applied mathematics to approximatively solve high-dimensional partial differential equations (PDEs).

Paper
Add Code

Strong overall error analysis for the training of artificial neural networks via random initializations

no code implementations • 15 Dec 2020 • Arnulf Jentzen, Adrian Riekert

Although deep learning based approximation algorithms have been applied very successfully to numerous problems, at the moment the reasons for their performance are not entirely understood from a mathematical point of view.

Stochastic Optimization

Paper
Add Code

Deep learning based numerical approximation algorithms for stochastic partial differential equations and high-dimensional nonlinear filtering problems

no code implementations • 2 Dec 2020 • Christian Beck, Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Ariel Neufeld

In this article we introduce and study a deep learning based approximation algorithm for solutions of stochastic partial differential equations (SPDEs).

Paper
Add Code

Weak error analysis for stochastic gradient descent optimization algorithms

no code implementations • 3 Jul 2020 • Aritz Bercher, Lukas Gonon, Arnulf Jentzen, Diyora Salimova

In applications one is often not only interested in the size of the error with respect to the objective function but also in the size of the error with respect to a test function which is possibly different from the objective function.

Face Recognition Fraud Detection

Paper
Add Code

Non-convergence of stochastic gradient descent in the training of deep neural networks

no code implementations • 12 Jun 2020 • Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

Deep neural networks have successfully been trained in various application areas with stochastic gradient descent.

Paper
Add Code

Space-time deep neural network approximations for high-dimensional partial differential equations

no code implementations • 3 Jun 2020 • Fabian Hornung, Arnulf Jentzen, Diyora Salimova

Each of these results establishes that DNNs overcome the curse of dimensionality in approximating suitable PDE solutions at a fixed time point $T>0$ and on a compact cube $[a, b]^d$ in space but none of these results provides an answer to the question whether the entire PDE solution on $[0, T]\times [a, b]^d$ can be approximated by DNNs without the curse of dimensionality.

Vocal Bursts Intensity Prediction

Paper
Add Code

Overall error analysis for the training of deep neural networks via stochastic gradient descent with random initialisation

no code implementations • 3 Mar 2020 • Arnulf Jentzen, Timo Welti

In spite of the accomplishments of deep learning based algorithms in numerous applications and very broad corresponding research interest, at the moment there is still no rigorous understanding of the reasons why such algorithms produce useful results in certain situations.

Paper
Add Code

Pricing and hedging American-style options with deep learning

1 code implementation • 23 Dec 2019 • Sebastian Becker, Patrick Cheridito, Arnulf Jentzen

In this paper we introduce a deep learning method for pricing and hedging American-style options.

Paper
Code

Efficient approximation of high-dimensional functions with neural networks

no code implementations • 9 Dec 2019 • Patrick Cheridito, Arnulf Jentzen, Florian Rossmannek

In this paper, we develop a framework for showing that neural networks can overcome the curse of dimensionality in different high-dimensional approximation problems.

Numerical Analysis Numerical Analysis 68T07 I.2.0

Paper
Add Code

Uniform error estimates for artificial neural network approximations for heat equations

no code implementations • 20 Nov 2019 • Lukas Gonon, Philipp Grohs, Arnulf Jentzen, David Kofler, David Šiška

These mathematical results from the scientific literature prove in part that algorithms based on ANNs are capable of overcoming the curse of dimensionality in the numerical approximation of high-dimensional PDEs.

Paper
Add Code

Full error analysis for the training of deep neural networks

no code implementations • 30 Sep 2019 • Christan Beck, Arnulf Jentzen, Benno Kuckuck

In this work we estimate for a certain deep learning algorithm each of these three errors and combine these three error estimates to obtain an overall error analysis for the deep learning algorithm under consideration.

Paper
Add Code

Deep neural network approximations for Monte Carlo algorithms

1 code implementation • 28 Aug 2019 • Philipp Grohs, Arnulf Jentzen, Diyora Salimova

One key argument in most of these results is, first, to use a Monte Carlo approximation scheme which can approximate the solution of the PDE under consideration at a fixed space-time point without the curse of dimensionality and, thereafter, to prove that DNNs are flexible enough to mimic the behaviour of the used approximation scheme.

Paper
Code

Space-time error estimates for deep neural network approximations for differential equations

no code implementations • 11 Aug 2019 • Philipp Grohs, Fabian Hornung, Arnulf Jentzen, Philipp Zimmermann

It is the subject of the main result of this article to provide space-time error estimates for DNN approximations of Euler approximations of certain perturbed differential equations.

Image Classification speech-recognition +1

Paper
Add Code

Solving high-dimensional optimal stopping problems using deep learning

no code implementations • 5 Aug 2019 • Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Timo Welti

We present numerical results for a large number of example problems, which include the pricing of many high-dimensional American and Bermudan options, such as Bermudan max-call options in up to 5000 dimensions.

Vocal Bursts Intensity Prediction

Paper
Add Code

Deep splitting method for parabolic PDEs

no code implementations • 8 Jul 2019 • Christian Beck, Sebastian Becker, Patrick Cheridito, Arnulf Jentzen, Ariel Neufeld

In this paper we introduce a numerical method for nonlinear parabolic PDEs that combines operator splitting with deep learning.

Paper
Add Code

Towards a regularity theory for ReLU networks -- chain rule and global error estimates

no code implementations • 13 May 2019 • Julius Berner, Dennis Elbrächter, Philipp Grohs, Arnulf Jentzen

Although for neural networks with locally Lipschitz continuous activation functions the classical derivative exists almost everywhere, the standard chain rule is in general not applicable.

Paper
Add Code

Convergence rates for the stochastic gradient descent method for non-convex objective functions

no code implementations • 2 Apr 2019 • Benjamin Fehrman, Benjamin Gess, Arnulf Jentzen

We prove the local convergence to minima and estimates on the rate of convergence for the stochastic gradient descent method in the case of not necessarily globally convex nor contracting objective functions.

BIG-bench Machine Learning

Paper
Add Code

A proof that deep artificial neural networks overcome the curse of dimensionality in the numerical approximation of Kolmogorov partial differential equations with constant diffusion and nonlinear drift coefficients

no code implementations • 19 Sep 2018 • Arnulf Jentzen, Diyora Salimova, Timo Welti

These numerical simulations indicate that DNNs seem to possess the fundamental flexibility to overcome the curse of dimensionality in the sense that the number of real parameters used to describe the DNN grows at most polynomially in both the reciprocal of the prescribed approximation accuracy $ \varepsilon > 0 $ and the dimension $ d \in \mathbb{N}$ of the function which the DNN aims to approximate in such computational problems.

Face Recognition Fraud Detection

Paper
Add Code

Analysis of the Generalization Error: Empirical Risk Minimization over Deep Artificial Neural Networks Overcomes the Curse of Dimensionality in the Numerical Approximation of Black-Scholes Partial Differential Equations

no code implementations • 9 Sep 2018 • Julius Berner, Philipp Grohs, Arnulf Jentzen

It can be concluded that ERM over deep neural network hypothesis classes overcomes the curse of dimensionality for the numerical solution of linear Kolmogorov equations with affine coefficients.

Paper
Add Code

A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations

no code implementations • 7 Sep 2018 • Philipp Grohs, Fabian Hornung, Arnulf Jentzen, Philippe von Wurstemberger

Such numerical simulations suggest that ANNs have the capacity to very efficiently approximate high-dimensional functions and, especially, indicate that ANNs seem to admit the fundamental power to overcome the curse of dimensionality when approximating the high-dimensional functions appearing in the above named computational problems.

Image Classification speech-recognition +2

Paper
Add Code

Solving the Kolmogorov PDE by means of deep learning

no code implementations • 1 Jun 2018 • Christian Beck, Sebastian Becker, Philipp Grohs, Nor Jaafari, Arnulf Jentzen

Stochastic differential equations (SDEs) and the Kolmogorov partial differential equations (PDEs) associated to them have been widely used in models from engineering, finance, and the natural sciences.

Paper
Add Code

Lower error bounds for the stochastic gradient descent optimization algorithm: Sharp convergence rates for slowly and fast decaying learning rates

no code implementations • 22 Mar 2018 • Arnulf Jentzen, Philippe von Wurstemberger

The stochastic gradient descent (SGD) optimization algorithm plays a central role in a series of machine learning applications.

Stochastic Optimization

Paper
Add Code

Strong error analysis for stochastic gradient descent optimization algorithms

no code implementations • 29 Jan 2018 • Arnulf Jentzen, Benno Kuckuck, Ariel Neufeld, Philippe von Wurstemberger

Stochastic gradient descent (SGD) optimization algorithms are key ingredients in a series of machine learning applications.

Numerical Analysis Probability

Paper
Add Code

Machine learning approximation algorithms for high-dimensional fully nonlinear partial differential equations and second-order backward stochastic differential equations

no code implementations • 18 Sep 2017 • Christian Beck, Weinan E, Arnulf Jentzen

The PDEs in such applications are high-dimensional as the dimension corresponds to the number of financial assets in a portfolio.

Portfolio Optimization

Paper
Add Code

Solving high-dimensional partial differential equations using deep learning

6 code implementations • 9 Jul 2017 • Jiequn Han, Arnulf Jentzen, Weinan E

Developing algorithms for solving high-dimensional partial differential equations (PDEs) has been an exceedingly difficult task for a long time, due to the notoriously difficult problem known as the "curse of dimensionality".

Vocal Bursts Intensity Prediction

334

Paper
Code

Deep learning-based numerical methods for high-dimensional parabolic partial differential equations and backward stochastic differential equations

5 code implementations • 15 Jun 2017 • Weinan E, Jiequn Han, Arnulf Jentzen

We propose a new algorithm for solving parabolic partial differential equations (PDEs) and backward stochastic differential equations (BSDEs) in high dimension, by making an analogy between the BSDE and reinforcement learning with the gradient of the solution playing the role of the policy function, and the loss function given by the error between the prescribed terminal condition and the solution of the BSDE.

reinforcement-learning Reinforcement Learning (RL)

245

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.