1 code implementation • 3 Oct 2023 • Jie Huang, Xinyun Chen, Swaroop Mishra, Huaixiu Steven Zheng, Adams Wei Yu, Xinying Song, Denny Zhou
Large Language Models (LLMs) have emerged as a groundbreaking technology with their unparalleled text generation capabilities across various applications.
2 code implementations • NeurIPS 2023 • Sang Michael Xie, Hieu Pham, Xuanyi Dong, Nan Du, Hanxiao Liu, Yifeng Lu, Percy Liang, Quoc V. Le, Tengyu Ma, Adams Wei Yu
The mixture proportions of pretraining data domains (e. g., Wikipedia, books, web text) greatly affect language model (LM) performance.
no code implementations • 13 Jul 2022 • Ruoxi Sun, Hanjun Dai, Adams Wei Yu
Extracting informative representations of molecules using Graph neural networks (GNNs) is crucial in AI-driven drug discovery.
1 code implementation • CVPR 2022 • Yingwei Li, Adams Wei Yu, Tianjian Meng, Ben Caine, Jiquan Ngiam, Daiyi Peng, Junyang Shen, Bo Wu, Yifeng Lu, Denny Zhou, Quoc V. Le, Alan Yuille, Mingxing Tan
In this paper, we propose two novel techniques: InverseAug that inverses geometric-related augmentations, e. g., rotation, to enable accurate geometric alignment between lidar points and image pixels, and LearnableAlign that leverages cross-attention to dynamically capture the correlations between image and lidar features during fusion.
no code implementations • 13 Dec 2021 • Nan Du, Yanping Huang, Andrew M. Dai, Simon Tong, Dmitry Lepikhin, Yuanzhong Xu, Maxim Krikun, Yanqi Zhou, Adams Wei Yu, Orhan Firat, Barret Zoph, Liam Fedus, Maarten Bosma, Zongwei Zhou, Tao Wang, Yu Emma Wang, Kellie Webster, Marie Pellat, Kevin Robinson, Kathleen Meier-Hellstern, Toju Duke, Lucas Dixon, Kun Zhang, Quoc V Le, Yonghui Wu, Zhifeng Chen, Claire Cui
Scaling language models with more data, compute and parameters has driven significant progress in natural language processing.
Ranked #10 on Language Modelling on LAMBADA
no code implementations • 19 Nov 2021 • Hieu Pham, Zihang Dai, Golnaz Ghiasi, Kenji Kawaguchi, Hanxiao Liu, Adams Wei Yu, Jiahui Yu, Yi-Ting Chen, Minh-Thang Luong, Yonghui Wu, Mingxing Tan, Quoc V. Le
Second, while increasing the dataset size and the model size has been the defacto method to improve the performance of deep learning models like BASIC, the effect of a large contrastive batch size on such contrastive-trained image-text models is not well-understood.
no code implementations • 19 Sep 2021 • ZiRui Wang, Adams Wei Yu, Orhan Firat, Yuan Cao
This paper explores zero-label learning in Natural Language Processing (NLP), whereby no human-annotated data is used anywhere during training and models are trained purely on synthetic data.
5 code implementations • ICLR 2022 • Jason Wei, Maarten Bosma, Vincent Y. Zhao, Kelvin Guu, Adams Wei Yu, Brian Lester, Nan Du, Andrew M. Dai, Quoc V. Le
We show that instruction tuning -- finetuning language models on a collection of tasks described via instructions -- substantially improves zero-shot performance on unseen tasks.
Ranked #1 on Question Answering on OBQA
2 code implementations • ICLR 2022 • ZiRui Wang, Jiahui Yu, Adams Wei Yu, Zihang Dai, Yulia Tsvetkov, Yuan Cao
With recent progress in joint modeling of visual and textual representations, Vision-Language Pretraining (VLP) has achieved impressive performance on many multimodal downstream tasks.
Ranked #4 on Visual Entailment on SNLI-VE val
no code implementations • NeurIPS 2020 • Xinyun Chen, Chen Liang, Adams Wei Yu, Dawn Song, Denny Zhou
Despite achieving tremendous success, existing deep learning models have exposed limitations in compositional generalization, the capability to learn compositional rules and apply them to unseen cases in a systematic manner.
no code implementations • 5 Jun 2020 • Xuanyi Dong, Mingxing Tan, Adams Wei Yu, Daiyi Peng, Bogdan Gabrys, Quoc V. Le
Efficient hyperparameter or architecture search methods have shown remarkable results, but each of them is only applicable to searching for either hyperparameters (HPs) or architectures.
no code implementations • ICLR 2020 • Xinyun Chen, Chen Liang, Adams Wei Yu, Denny Zhou, Dawn Song, Quoc V. Le
Integrating distributed representations with symbolic operations is essential for reading comprehension requiring complex reasoning, such as counting, sorting and arithmetics, but most existing approaches are hard to scale to more domains or more complex reasoning.
Ranked #5 on Question Answering on DROP Test
15 code implementations • ICLR 2018 • Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui Zhao, Kai Chen, Mohammad Norouzi, Quoc V. Le
On the SQuAD dataset, our model is 3x to 13x faster in training and 4x to 9x faster in inference, while achieving equivalent accuracy to recurrent models.
Ranked #27 on Question Answering on SQuAD1.1 dev
no code implementations • 11 Mar 2018 • Yingxiang Yang, Adams Wei Yu, Zhaoran Wang, Tuo Zhao
We propose a nonparametric method for detecting nonlinear causal relationship within a set of multidimensional discrete time series, by using sparse additive models (SpAMs).
1 code implementation • The Thirty-Second AAAI Conferenceon Artificial Intelligence 2018 • Lei Huang, Xianglong Liu, Bo Lang, Adams Wei Yu, Yongliang Wang, Bo Li
In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feed-forward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM).
no code implementations • ICLR 2018 • Minh-Thang Luong, David Dohan, Adams Wei Yu, Quoc V. Le, Barret Zoph, Vijay Vasudevan
Neural architecture search (NAS), the task of finding neural architectures automatically, has recently emerged as a promising approach for unveiling better models over human-designed ones.
no code implementations • 13 Oct 2017 • Lin Xiao, Adams Wei Yu, Qihang Lin, Weizhu Chen
Machine learning with big data often involves large optimization models.
1 code implementation • 16 Sep 2017 • Lei Huang, Xianglong Liu, Bo Lang, Adams Wei Yu, Yongliang Wang, Bo Li
In this paper, we generalize such square orthogonal matrix to orthogonal rectangular matrix and formulating this problem in feed-forward Neural Networks (FNNs) as Optimization over Multiple Dependent Stiefel Manifolds (OMDSM).
2 code implementations • ICLR 2018 • Adams Wei Yu, Lei Huang, Qihang Lin, Ruslan Salakhutdinov, Jaime Carbonell
In this paper, we propose a generic and simple strategy for utilizing stochastic gradient information in optimization.
4 code implementations • ACL 2017 • Adams Wei Yu, Hongrae Lee, Quoc V. Le
Recurrent Neural Networks are showing much promise in many sub-areas of natural language processing, ranging from document classification to machine translation to automatic question answering.
no code implementations • 23 Feb 2016 • Maria Florina Balcan, Simon S. Du, Yining Wang, Adams Wei Yu
We consider the noisy power method algorithm, which has wide applications in machine learning and statistics, especially those related to principal component analysis (PCA) under resource (communication, memory or privacy) constraints.
no code implementations • 9 Jan 2016 • Yining Wang, Adams Wei Yu, Aarti Singh
We derive computationally tractable methods to select a small subset of experiment settings from a large pool of given design points.
no code implementations • 20 Aug 2015 • Suvrit Sra, Adams Wei Yu, Mu Li, Alexander J. Smola
We study distributed stochastic convex optimization under the delayed gradient model where the server nodes perform parameter updates, while the worker nodes compute stochastic gradients.
no code implementations • 14 Aug 2015 • Adams Wei Yu, Qihang Lin, Tianbao Yang
We propose a doubly stochastic primal-dual coordinate optimization algorithm for empirical risk minimization, which can be formulated as a bilinear saddle-point problem.
no code implementations • NeurIPS 2014 • Adams Wei Yu, Wanli Ma, YaoLiang Yu, Jaime Carbonell, Suvrit Sra
We study the problem of finding structured low-rank matrices using nuclear norm regularization where the structure is encoded by a linear map.