no code implementations • 19 Dec 2023 • Hui Wu, Yi Gan, Feng Yuan, Jing Ma, Wei Zhu, Yutao Xu, Hong Zhu, Yuhua Zhu, Xiaoli Liu, Jinghui Gu
A customized Scaled-Dot-Product-Attention kernel is designed to match our fusion policy based on the segment KV cache solution.
1 code implementation • 4 May 2023 • Jose A. Carrillo, Nicolas Garcia Trillos, Sixu Li, Yuhua Zhu
Federated learning is an important framework in modern machine learning that seeks to integrate the training of learning models from multiple users, each user having their own local data set, in a way that is sensitive to data privacy and to communication loss constraints.
no code implementations • 14 Oct 2022 • Yuhua Zhu, Zachary Izzo, Lexing Ying
The optimal policy for the limiting HJB equation can be explicitly obtained for several common bandit problems, and we give numerical methods to solve the HJB equation when an explicit solution is not available.
no code implementations • 2 Dec 2021 • Xiaowu Dai, Yuhua Zhu
We study the statistical properties of the dynamic trajectory of stochastic gradient descent (SGD).
no code implementations • 25 Oct 2021 • Xun Tang, Lexing Ying, Yuhua Zhu
When the error is in the residual norm, we prove that the shifting factor is always positive and upper bounded by $1+O\left(1/n\right)$, where $n$ is the number of samples used in learning each row of the transition matrix.
Model-based Reinforcement Learning reinforcement-learning +1
no code implementations • 3 Aug 2021 • Yuhua Zhu, Lexing Ying
The objective function of the variational formulation consists of two parts: one for maximizing the value function and the other for minimizing the Bellman residual.
no code implementations • 17 Dec 2020 • Lexing Ying, Yuhua Zhu
This note summarizes the optimization formulations used in the study of Markov decision processes.
Optimization and Control
no code implementations • ICLR 2021 • Jing An, Lexing Ying, Yuhua Zhu
We consider two commonly-used techniques, resampling and reweighting, that rebalance the proportions of the subgroups to maintain the desired objective function.
no code implementations • 11 Jun 2020 • Yuhua Zhu, Zach Izzo, Lexing Ying
The main idea is to borrow extra randomness from the future to approximately re-sample the next state when the underlying dynamics of the problem are sufficiently smooth.
no code implementations • 3 Dec 2018 • Xiaowu Dai, Yuhua Zhu
In particular, we give an explicit escaping time of SGD from a local minimum in the finite-time regime and prove that SGD tends to converge to flatter minima in the asymptotic regime (although may take exponential time to converge) regardless of the batch size.