Search Results for author: Borjan Geshkovski

Found 5 papers, 4 papers with code

A mathematical perspective on Transformers

1 code implementation17 Dec 2023 Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet

Transformers play a central role in the inner workings of large language models.

The emergence of clusters in self-attention dynamics

1 code implementation NeurIPS 2023 Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet

Cluster locations are determined by the initial tokens, confirming context-awareness of representations learned by Transformers.

Turnpike in optimal control of PDEs, ResNets, and beyond

no code implementations8 Feb 2022 Borjan Geshkovski, Enrique Zuazua

The \emph{turnpike property} in contemporary macroeconomics asserts that if an economic planner seeks to move an economy from one level of capital to another, then the most efficient path, as long as the planner has enough time, is to rapidly move stock to a level close to the optimal stationary or constant path, then allow for capital to develop along that path until the desired term is nearly reached, at which point the stock ought to be moved to the final target.

Sparsity in long-time control of neural ODEs

1 code implementation26 Feb 2021 Carlos Esteve-Yagüe, Borjan Geshkovski

We consider the neural ODE and optimal control perspective of supervised learning, with $\ell^1$-control penalties, where rather than only minimizing a final cost (the \emph{empirical risk}) for the state, we integrate this cost over the entire time horizon.

Large-time asymptotics in deep learning

1 code implementation6 Aug 2020 Carlos Esteve, Borjan Geshkovski, Dario Pighin, Enrique Zuazua

We consider the neural ODE perspective of supervised learning and study the impact of the final time $T$ (which may indicate the depth of a corresponding ResNet) in training.

Cannot find the paper you are looking for? You can Submit a new open access paper.