no code implementations • 18 Apr 2024 • Zixiang Chen, Jun Han, YongQian Li, Yiwen Kou, Eran Halperin, Robert E. Tillman, Quanquan Gu
Electronic health records (EHRs) are a pivotal data source that enables numerous applications in computational medicine, e. g., disease progression prediction, clinical trial design, and health economics and outcomes research.
no code implementations • 18 Apr 2024 • Yiwen Kou, Zixiang Chen, Quanquan Gu, Sham M. Kakade
We then demonstrate how a trained neural network with SGD can effectively approximate this good network, solving the $k$-parity problem with small statistical errors.
no code implementations • 15 Feb 2024 • Huizhuo Yuan, Zixiang Chen, Kaixuan Ji, Quanquan Gu
Fine-tuning Diffusion Models remains an underexplored frontier in generative artificial intelligence (GenAI), especially when compared with the remarkable progress made in fine-tuning Large Language Models (LLMs).
2 code implementations • 2 Jan 2024 • Zixiang Chen, Yihe Deng, Huizhuo Yuan, Kaixuan Ji, Quanquan Gu
In this paper, we delve into the prospect of growing a strong LLM out of a weak one without the need for acquiring additional human-annotated data.
no code implementations • 14 Dec 2023 • Zixiang Chen, Huizhuo Yuan, YongQian Li, Yiwen Kou, Junkai Zhang, Quanquan Gu
Despite its success in continuous spaces, discrete diffusion models, which apply to domains such as texts and natural languages, remain under-studied and often suffer from slow generation speed.
3 code implementations • 7 Nov 2023 • Yihe Deng, Weitong Zhang, Zixiang Chen, Quanquan Gu
While it is widely acknowledged that the quality of a prompt, such as a question, significantly impacts the quality of the response provided by LLMs, a systematic method for crafting questions that LLMs can better comprehend is still underdeveloped.
no code implementations • 12 Oct 2023 • Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Peter L. Bartlett
Transformers pretrained on diverse tasks exhibit remarkable in-context learning (ICL) capabilities, enabling them to solve unseen tasks solely based on input contexts without adjusting model parameters.
no code implementations • 2 Oct 2023 • Zixiang Chen, Yihe Deng, Yuanzhi Li, Quanquan Gu
Multi-modal learning has become increasingly popular due to its ability to leverage information from different data sources (e. g., text and images) to improve the model performance.
1 code implementation • 7 Mar 2023 • Yiwen Kou, Zixiang Chen, Yuanzhou Chen, Quanquan Gu
We show that, under mild conditions, the neural network trained by gradient descent can achieve near-zero training loss and Bayes optimal test risk.
no code implementations • 3 Mar 2023 • Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, Quanquan Gu, Sham M. Kakade
On the other hand, we provide some negative results for stochastic gradient descent (SGD) for ReLU regression with symmetric Bernoulli data: if the model is well-specified, the excess risk of SGD is provably no better than that of GLM-tron ignoring constant factors, for each problem instance; and in the noiseless case, GLM-tron can achieve a small risk while SGD unavoidably suffers from a constant risk in expectation.
no code implementations • 4 Nov 2022 • Zhengyong Huang, Sijuan Zou, Guoshuai Wang, Zixiang Chen, Hao Shen, HaiYan Wang, Na Zhang, Lu Zhang, Fan Yang, Haining Wangg, Dong Liang, Tianye Niu, Xiaohua Zhuc, Zhanli Hua
In this paper, we propose a deep learning segmentation method based on multimodal positron emission tomography-computed tomography (PET-CT), which combines the high sensitivity of PET and the precise anatomical information of CT. We design an improved spatial attention network(ISA-Net) to increase the accuracy of PET or CT in detecting tumors, which uses multi-scale convolution operation to extract feature information and can highlight the tumor region location information and suppress the non-tumor region location information.
no code implementations • 30 Sep 2022 • Zixiang Chen, Chris Junchi Li, Angela Yuan, Quanquan Gu, Michael I. Jordan
With the increasing need for handling large state and action spaces, general function approximation has become a key technique in reinforcement learning (RL).
2 code implementations • 4 Aug 2022 • Zixiang Chen, Yihe Deng, Yue Wu, Quanquan Gu, Yuanzhi Li
To our knowledge, this is the first result towards formally understanding the mechanism of the MoE layer for deep learning.
no code implementations • 14 Feb 2022 • Yuan Cao, Zixiang Chen, Mikhail Belkin, Quanquan Gu
In this paper, we study the benign overfitting phenomenon in training a two-layer convolutional neural network (CNN).
no code implementations • NeurIPS 2021 • Zixiang Chen, Dongruo Zhou, Quanquan Gu
In this paper, we propose LENA (Last stEp shriNkAge), a faster perturbed stochastic gradient framework for finding local minima.
no code implementations • 25 Jun 2021 • Spencer Frei, Difan Zou, Zixiang Chen, Quanquan Gu
We show that there exists a universal constant $C_{\mathrm{err}}>0$ such that if a pseudolabeler $\boldsymbol{\beta}_{\mathrm{pl}}$ can achieve classification error at most $C_{\mathrm{err}}$, then for any $\varepsilon>0$, an iterative self-training algorithm initialized at $\boldsymbol{\beta}_0 := \boldsymbol{\beta}_{\mathrm{pl}}$ using pseudolabels $\hat y = \mathrm{sgn}(\langle \boldsymbol{\beta}_t, \mathbf{x}\rangle)$ and using at most $\tilde O(d/\varepsilon^2)$ unlabeled examples suffices to learn the Bayes-optimal classifier up to $\varepsilon$ error, where $d$ is the ambient dimension.
no code implementations • 15 Feb 2021 • Zixiang Chen, Dongruo Zhou, Quanquan Gu
To assess the optimality of our algorithm, we also prove an $\tilde{\Omega}( dH\sqrt{T})$ lower bound on the regret.
no code implementations • NeurIPS 2020 • Zixiang Chen, Yuan Cao, Quanquan Gu, Tong Zhang
In this paper, we provide a generalized neural tangent kernel analysis and show that noisy gradient descent with weight decay can still exhibit a "kernel-like" behavior.
no code implementations • ICLR 2021 • Zixiang Chen, Yuan Cao, Difan Zou, Quanquan Gu
A recent line of research on deep learning focuses on the extremely over-parameterized setting, and shows that when the network width is larger than a high degree polynomial of the training sample size $n$ and the inverse of the target error $\epsilon^{-1}$, deep neural networks learned by (stochastic) gradient descent enjoy nice optimization and generalization guarantees.
1 code implementation • 8 Oct 2018 • Tianyang Hu, Zixiang Chen, Hanxi Sun, Jincheng Bai, Mao Ye, Guang Cheng
We propose two novel samplers to generate high-quality samples from a given (un-normalized) probability density.