1 code implementation • 17 Dec 2023 • Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet
Transformers play a central role in the inner workings of large language models.
1 code implementation • NeurIPS 2023 • Borjan Geshkovski, Cyril Letrouit, Yury Polyanskiy, Philippe Rigollet
Cluster locations are determined by the initial tokens, confirming context-awareness of representations learned by Transformers.
no code implementations • 8 Feb 2022 • Borjan Geshkovski, Enrique Zuazua
The \emph{turnpike property} in contemporary macroeconomics asserts that if an economic planner seeks to move an economy from one level of capital to another, then the most efficient path, as long as the planner has enough time, is to rapidly move stock to a level close to the optimal stationary or constant path, then allow for capital to develop along that path until the desired term is nearly reached, at which point the stock ought to be moved to the final target.
1 code implementation • 26 Feb 2021 • Carlos Esteve-Yagüe, Borjan Geshkovski
We consider the neural ODE and optimal control perspective of supervised learning, with $\ell^1$-control penalties, where rather than only minimizing a final cost (the \emph{empirical risk}) for the state, we integrate this cost over the entire time horizon.
1 code implementation • 6 Aug 2020 • Carlos Esteve, Borjan Geshkovski, Dario Pighin, Enrique Zuazua
We consider the neural ODE perspective of supervised learning and study the impact of the final time $T$ (which may indicate the depth of a corresponding ResNet) in training.