1 code implementation • 16 Oct 2022 • Nilesh Gupta, Patrick H. Chen, Hsiang-Fu Yu, Cho-Jui Hsieh, Inderjit S Dhillon
A popular approach for dealing with the large label space is to arrange the labels into a shallow tree-based index and then learn an ML model to efficiently search this index via beam search.
no code implementations • 22 Jun 2022 • Patrick H. Chen, Chang Wei-cheng, Yu Hsiang-fu, Inderjit S. Dhillon, Hsieh Cho-jui
Approximate K-Nearest Neighbor Search (AKNNS) has now become ubiquitous in modern applications, for example, as a fast search procedure with two tower deep learning models.
no code implementations • 3 Dec 2019 • Patrick H. Chen, Wei Wei, Cho-Jui Hsieh, Bo Dai
In this paper, we propose a new method to overcome catastrophic forgetting by adding generative regularization to Bayesian inference framework.
no code implementations • IJCNLP 2019 • Yukun Ma, Patrick H. Chen, Cho-Jui Hsieh
For example, input embedding and Softmax matrices in IWSLT-2014 German-to-English data set account for more than 80{\%} of the total model parameters.
no code implementations • 25 Sep 2019 • Patrick H. Chen, Sashank Reddi, Sanjiv Kumar, Cho-Jui Hsieh
We consider the learning to learn problem, where the goal is to leverage deeplearning models to automatically learn (iterative) optimization algorithms for training machine learning models.
no code implementations • TACL 2019 • Liunian Harold Li, Patrick H. Chen, Cho-Jui Hsieh, Kai-Wei Chang
Contextual representation models have achieved great success in improving various downstream natural language processing tasks.
no code implementations • 28 Feb 2019 • Liunian Harold Li, Patrick H. Chen, Cho-Jui Hsieh, Kai-Wei Chang
Our framework reduces the time spent on the output layer to a negligible level, eliminates almost all the trainable parameters of the softmax layer and performs language modeling without truncating the vocabulary.
no code implementations • ICLR 2019 • Patrick H. Chen, Si Si, Sanjiv Kumar, Yang Li, Cho-Jui Hsieh
The algorithm achieves an order of magnitude faster inference than the original softmax layer for predicting top-$k$ words in various tasks such as beam search in machine translation or next words prediction.
no code implementations • NeurIPS 2018 • Patrick H. Chen, Si Si, Yang Li, Ciprian Chelba, Cho-Jui Hsieh
Model compression is essential for serving large deep neural nets on devices with limited resources or applications that require real-time responses.
no code implementations • ICLR 2018 • Patrick H. Chen, Cho-Jui Hsieh
Despite many second-order methods have been proposed to train neural networks, most of the results were done on smaller single layer fully connected networks, so we still cannot conclude whether it's useful in training deep convolutional networks.