no code implementations • 12 Mar 2024 • Hongkang Li, Shuai Zhang, Yihua Zhang, Meng Wang, Sijia Liu, Pin-Yu Chen
Despite algorithmic efforts to improve the minority group accuracy, a theoretical generalization analysis of ERM on individual groups remains elusive.
no code implementations • 23 Feb 2024 • Hongkang Li, Meng Wang, Songtao Lu, Xiaodong Cui, Pin-Yu Chen
Despite the empirical success, the mechanics of how to train a Transformer to achieve ICL and the corresponding ICL capacity is mostly elusive due to the technical challenges of analyzing the nonconvex training problems resulting from the nonlinear self-attention and nonlinear activation in Transformers.
no code implementations • 24 Oct 2023 • Shuai Zhang, Hongkang Li, Meng Wang, Miao Liu, Pin-Yu Chen, Songtao Lu, Sijia Liu, Keerthiram Murugesan, Subhajit Chaudhury
This paper provides the first theoretical convergence and sample complexity analysis of the practical setting of DQNs with $\epsilon$-greedy policy.
no code implementations • 26 Aug 2023 • Hui Wan, Hongkang Li, Songtao Lu, Xiaodong Cui, Marina Danilevsky
The integration of external personalized context information into document-grounded conversational systems has significant potential business value, but has not been well-studied.
no code implementations • 22 Aug 2023 • Yuankai Luo, Hongkang Li, Lei Shi, Xiao-Ming Wu
Empirically, we demonstrate that graph transformers with HDSE excel in graph classification, regression on 7 graph-level datasets, and node classification on 11 large-scale graphs, including those with up to a billion nodes.
no code implementations • 12 Feb 2023 • Hongkang Li, Meng Wang, Sijia Liu, Pin-Yu Chen
Based on a data model characterizing both label-relevant and label-irrelevant tokens, this paper provides the first theoretical analysis of training a shallow ViT, i. e., one self-attention layer followed by a two-layer perceptron, for a classification task.
no code implementations • 7 Jul 2022 • Hongkang Li, Meng Wang, Sijia Liu, Pin-Yu Chen, JinJun Xiong
Graph convolutional networks (GCNs) have recently achieved great empirical success in learning graph-structured data.
no code implementations • 7 Jul 2022 • Hongkang Li, Shuai Zhang, Meng Wang
In addition, for the first time, this paper characterizes the impact of the input distributions on the sample complexity and the learning rate.
no code implementations • 1 Jan 2021 • Hongkang Li, Shuai Zhang, Meng Wang
Instead of following the conventional and restrictive assumption in the literature that the input features follow the standard Gaussian distribution, this paper, for the first time, analyzes a more general and practical scenario that the input features follow a Gaussian mixture model of a finite number of Gaussian distributions of various mean and variance.