no code implementations • 23 May 2024 • Huy Nguyen, Pedram Akbarian, Trang Pham, Trang Nguyen, Shujian Zhang, Nhat Ho
The cosine router in sparse Mixture of Experts (MoE) has recently emerged as an attractive alternative to the conventional linear router.
no code implementations • 25 Jan 2024 • Huy Nguyen, Pedram Akbarian, Nhat Ho
We demonstrate that due to interactions between the temperature and other model parameters via some partial differential equations, the convergence rates of parameter estimations are slower than any polynomial rates, and could be as slow as $\mathcal{O}(1/\log(n))$, where $n$ denotes the sample size.
no code implementations • 22 Oct 2023 • Huy Nguyen, Pedram Akbarian, TrungTin Nguyen, Nhat Ho
Mixture-of-experts (MoE) model incorporates the power of multiple submodels via gating functions to achieve greater performance in numerous regression and classification applications.
no code implementations • 25 Sep 2023 • Huy Nguyen, Pedram Akbarian, Fanqi Yan, Nhat Ho
When the true number of experts $k_{\ast}$ is known, we demonstrate that the convergence rates of density and parameter estimations are both parametric on the sample size.