Search Results for author: Adams Wei Yu

Found 25 papers, 10 papers with code

Large Language Models Cannot Self-Correct Reasoning Yet

1 code implementation • 3 Oct 2023 • Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, Denny Zhou

Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications.

Text Generation

Paper
Code

DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining

2 code implementations • NeurIPS 2023 • Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu

The mixture proportions of pretraining data domains (e. g., Wikipedia, books, web text) greatly affect language model (LM) performance.

Language Modelling

459

Paper
Code

Does GNN Pretraining Help Molecular Representation?

no code implementations • 13 Jul 2022 • Ruoxi Sun, Hanjun Dai, Adams Wei Yu

Extracting informative representations of molecules using Graph neural networks (GNNs) is crucial in AI-driven drug discovery.

Drug Discovery molecular representation

Paper
Add Code

DeepFusion: Lidar-Camera Deep Fusion for Multi-Modal 3D Object Detection

1 code implementation • CVPR 2022 • Yingwei Li, Adams Wei Yu, Tianjian Meng, Ben Caine, Jiquan Ngiam, Daiyi Peng, Junyang Shen, Bo Wu, Yifeng Lu, Denny Zhou, Quoc V. Le, Alan Yuille, Mingxing Tan

In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e. g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion.

3D Object Detection Autonomous Driving +2

2,777

Paper
Code

GLaM: Efficient Scaling of Language Models with Mixture-of-Experts

no code implementations • 13 Dec 2021 • Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathleen Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc V Le, Yonghui Wu, Zhifeng Chen, Claire Cui

Scaling language models with more data, compute and parameters has driven significant progress in natural language processing.

Ranked #10 on Language Modelling on LAMBADA

Common Sense Reasoning In-Context Learning +2

Paper
Add Code

Combined Scaling for Zero-shot Transfer Learning

no code implementations • 19 Nov 2021 • Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le

Second, while increasing the dataset size and the model size has been the defacto method to improve the performance of deep learning models like BASIC, the effect of a large contrastive batch size on such contrastive-trained image-text models is not well-understood.

Ranked #3 on Zero-Shot Transfer Image Classification on ImageNet-Sketch

Classification Contrastive Learning +3

Paper
Add Code

Towards Zero-Label Language Learning

no code implementations • 19 Sep 2021 • ZiRui Wang, Adams Wei Yu, Orhan Firat, Yuan Cao

This paper explores zero-label learning in Natural Language Processing (NLP), whereby no human-annotated data is used anywhere during training and models are trained purely on synthetic data.

Data Augmentation

Paper
Add Code

Finetuned Language Models Are Zero-Shot Learners

5 code implementations • ICLR 2022 • Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le

We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks.

Ranked #1 on Question Answering on OBQA

Common Sense Reasoning Coreference Resolution +8

21,504

Paper
Code

SimVLM: Simple Visual Language Model Pretraining with Weak Supervision

2 code implementations • ICLR 2022 • ZiRui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao

With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks.

Ranked #4 on Visual Entailment on SNLI-VE val

Image Captioning Language Modelling +2

Paper
Code

Compositional Generalization via Neural-Symbolic Stack Machines

no code implementations • NeurIPS 2020 • Xinyun Chen, Chen Liang, Adams Wei Yu, Dawn Song, Denny Zhou

Despite achieving tremendous success, existing deep learning models have exposed limitations in compositional generalization, the capability to learn compositional rules and apply them to unseen cases in a systematic manner.

Few-Shot Learning Machine Translation +1

Paper
Add Code

AutoHAS: Efficient Hyperparameter and Architecture Search

no code implementations • 5 Jun 2020 • Xuanyi Dong, Mingxing Tan, Adams Wei Yu, Daiyi Peng, Bogdan Gabrys, Quoc V. Le

Efficient hyperparameter or architecture search methods have shown remarkable results, but each of them is only applicable to searching for either hyperparameters (HPs) or architectures.

Hyperparameter Optimization Neural Architecture Search +1

Paper
Add Code

Neural Symbolic Reader: Scalable Integration of Distributed and Symbolic Representations for Reading Comprehension

no code implementations • ICLR 2020 • Xinyun Chen, Chen Liang, Adams Wei Yu, Denny Zhou, Dawn Song, Quoc V. Le

Integrating distributed representations with symbolic operations is essential for reading comprehension requiring complex reasoning, such as counting, sorting and arithmetics, but most existing approaches are hard to scale to more domains or more complex reasoning.

Ranked #5 on Question Answering on DROP Test

Data Augmentation Math +2

Paper
Add Code

QANet: Combining Local Convolution with Global Self-Attention for Reading Comprehension

15 code implementations • ICLR 2018 • Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le

On the SQuAD dataset, our model is 3x to 13x faster in training and 4x to 9x faster in inference, while achieving equivalent accuracy to recurrent models.

Ranked #27 on Question Answering on SQuAD1.1 dev

Machine Translation Question Answering +2

120

Paper
Code

Detecting Nonlinear Causality in Multivariate Time Series with Sparse Additive Models

no code implementations • 11 Mar 2018 • Yingxiang Yang, Adams Wei Yu, Zhaoran Wang, Tuo Zhao

We propose a nonparametric method for detecting nonlinear causal relationship within a set of multidimensional discrete time series, by using sparse additive models (SpAMs).

Additive models Model Selection +2

Paper
Add Code

Orthogonal Weight Normalization: Solution to Optimization overMultiple Dependent Stiefel Manifolds in Deep Neural Networks

1 code implementation • The Thirty-Second AAAI Conferenceon Artificial Intelligence 2018 • Lei Huang, Xianglong Liu, Bo Lang, Adams Wei Yu, Yongliang Wang, Bo Li

In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feed-forward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM).

Paper
Code

EXPLORING NEURAL ARCHITECTURE SEARCH FOR LANGUAGE TASKS

no code implementations • ICLR 2018 • Minh-Thang Luong, David Dohan, Adams Wei Yu, Quoc V. Le, Barret Zoph, Vijay Vasudevan

Neural architecture search (NAS), the task of finding neural architectures automatically, has recently emerged as a promising approach for unveiling better models over human-designed ones.

Language Modelling Neural Architecture Search +2

Paper
Add Code

DSCOVR: Randomized Primal-Dual Block Coordinate Algorithms for Asynchronous Distributed Optimization

no code implementations • 13 Oct 2017 • Lin Xiao, Adams Wei Yu, Qihang Lin, Weizhu Chen

Machine learning with big data often involves large optimization models.

Distributed Computing Distributed Optimization

Paper
Add Code

Orthogonal Weight Normalization: Solution to Optimization over Multiple Dependent Stiefel Manifolds in Deep Neural Networks

1 code implementation • 16 Sep 2017 • Lei Huang, Xianglong Liu, Bo Lang, Adams Wei Yu, Yongliang Wang, Bo Li

Image Classification

Paper
Code

Block-Normalized Gradient Method: An Empirical Study for Training Deep Neural Network

2 code implementations • ICLR 2018 • Adams Wei Yu, Lei Huang, Qihang Lin, Ruslan Salakhutdinov, Jaime Carbonell

In this paper, we propose a generic and simple strategy for utilizing stochastic gradient information in optimization.

Paper
Code

Learning to Skim Text

4 code implementations • ACL 2017 • Adams Wei Yu, Hongrae Lee, Quoc V. Le

Recurrent Neural Networks are showing much promise in many sub-areas of natural language processing, ranging from document classification to machine translation to automatic question answering.

Document Classification General Classification +4

Paper
Code

An Improved Gap-Dependency Analysis of the Noisy Power Method

no code implementations • 23 Feb 2016 • Maria Florina Balcan, Simon S. Du, Yining Wang, Adams Wei Yu

We consider the noisy power method algorithm, which has wide applications in machine learning and statistics, especially those related to principal component analysis (PCA) under resource (communication, memory or privacy) constraints.

Paper
Add Code

On Computationally Tractable Selection of Experiments in Measurement-Constrained Regression Models

no code implementations • 9 Jan 2016 • Yining Wang, Adams Wei Yu, Aarti Singh

We derive computationally tractable methods to select a small subset of experiment settings from a large pool of given design points.

Combinatorial Optimization regression

Paper
Add Code

AdaDelay: Delay Adaptive Distributed Stochastic Convex Optimization

no code implementations • 20 Aug 2015 • Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola

We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients.

Paper
Add Code

Doubly Stochastic Primal-Dual Coordinate Method for Bilinear Saddle-Point Problem

no code implementations • 14 Aug 2015 • Adams Wei Yu, Qihang Lin, Tianbao Yang

We propose a doubly stochastic primal-dual coordinate optimization algorithm for empirical risk minimization, which can be formulated as a bilinear saddle-point problem.

Paper
Add Code

Efficient Structured Matrix Rank Minimization

no code implementations • NeurIPS 2014 • Adams Wei Yu, Wanli Ma, YaoLiang Yu, Jaime Carbonell, Suvrit Sra

We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.