Search Results for author: Yichuan Deng

Found 14 papers, 0 papers with code

Attention is Naturally Sparse with Gaussian Distributed Input

no code implementations • 3 Apr 2024 • Yichuan Deng, Zhao Song, Chiwun Yang

The computational intensity of Large Language Models (LLMs) is a critical bottleneck, primarily due to the $O(n^2)$ complexity of the attention mechanism in transformer architectures.

Computational Efficiency

Paper
Add Code

Enhancing Stochastic Gradient Descent: A Unified Framework and Novel Acceleration Methods for Faster Convergence

no code implementations • 2 Feb 2024 • Yichuan Deng, Zhao Song, Chiwun Yang

Based on SGD, previous works have proposed many algorithms that have improved convergence speed and generalization in stochastic optimization, such as SGDm, AdaGrad, Adam, etc.

Stochastic Optimization

Paper
Add Code

Unmasking Transformers: A Theoretical Approach to Data Recovery via Attention Weights

no code implementations • 19 Oct 2023 • Yichuan Deng, Zhao Song, Shenghao Xie, Chiwun Yang

In the realm of deep learning, transformers have emerged as a dominant architecture, particularly in natural language processing tasks.

Paper
Add Code

Superiority of Softmax: Unveiling the Performance Edge Over Linear Attention

no code implementations • 18 Oct 2023 • Yichuan Deng, Zhao Song, Tianyi Zhou

Large transformer models have achieved state-of-the-art results in numerous natural language processing tasks.

Paper
Add Code

Clustered Linear Contextual Bandits with Knapsacks

no code implementations • 21 Aug 2023 • Yichuan Deng, Michalis Mamakos, Zhao Song

Thus, maximizing the total reward requires learning not only models about the reward and the resource consumption, but also cluster memberships.

Econometrics Multi-Armed Bandits

Paper
Add Code

Convergence of Two-Layer Regression with Nonlinear Units

no code implementations • 16 Aug 2023 • Yichuan Deng, Zhao Song, Shenghao Xie

Softmax unit and ReLU unit are the key structure in attention computation.

regression

Paper
Add Code

Zero-th Order Algorithm for Softmax Attention Optimization

no code implementations • 17 Jul 2023 • Yichuan Deng, Zhihang Li, Sridhar Mahadevan, Zhao Song

We demonstrate the convergence of our algorithm, highlighting its effectiveness in efficiently computing gradients for large-scale LLMs.

Paper
Add Code

Faster Robust Tensor Power Method for Arbitrary Order

no code implementations • 1 Jun 2023 • Yichuan Deng, Zhao Song, Junze Yin

Tensor decomposition is a fundamental method used in various areas to deal with high-dimensional data.

Tensor Decomposition

Paper
Add Code

Attention Scheme Inspired Softmax Regression

no code implementations • 20 Apr 2023 • Yichuan Deng, Zhihang Li, Zhao Song

One of the key computation in LLMs is the softmax unit.

regression

Paper
Add Code

Solving Tensor Low Cycle Rank Approximation

no code implementations • 13 Apr 2023 • Yichuan Deng, Yeqi Gao, Zhao Song

For the tensor classical rank, tucker rank and train rank, it has been well studied in [Song, Woodruff, Zhong SODA 2019].

speech-recognition Speech Recognition

Paper
Add Code

Randomized and Deterministic Attention Sparsification Algorithms for Over-parameterized Feature Dimension

no code implementations • 10 Apr 2023 • Yichuan Deng, Sridhar Mahadevan, Zhao Song

It runs in $\widetilde{O}(\mathrm{nnz}(X) + n^{\omega} ) $ time, has $1-\delta$ succeed probability, and chooses $m = O(n \log(n/\delta))$.

Sentence

Paper
Add Code

Streaming Kernel PCA Algorithm With Small Space

no code implementations • 8 Mar 2023 • Yichuan Deng, Zhao Song, Zifan Wang, Han Zhang

The kernel method, which is commonly used in learning algorithms such as Support Vector Machines (SVMs), has also been applied in PCA algorithms.

Paper
Add Code

Training Overparametrized Neural Networks in Sublinear Time

no code implementations • 9 Aug 2022 • Yichuan Deng, Hang Hu, Zhao Song, Omri Weinstein, Danyang Zhuo

The success of deep learning comes at a tremendous computational and energy cost, and the scalability of training massively overparametrized neural networks is becoming a real barrier to the progress of artificial intelligence (AI).

Paper
Add Code

SODA: Site Object Detection dAtaset for Deep Learning in Construction

no code implementations • 19 Feb 2022 • Rui Duan, Hui Deng, Mao Tian, Yichuan Deng, Jiarui Lin

In this manner, this research contributes a large-scale image dataset for the development of deep learning-based object detection methods in the construction industry and sets up a performance benchmark for further evaluation of corresponding algorithms in this area.

Object object-detection +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.