no code implementations • 27 Mar 2024 • Yanyu Li, Xian Liu, Anil Kag, Ju Hu, Yerlan Idelbayev, Dhritiman Sagar, Yanzhi Wang, Sergey Tulyakov, Jian Ren
Our findings reveal that, instead of replacing the CLIP text encoder used in Stable Diffusion with other large language models, we can enhance it through our proposed fine-tuning approach, TextCraftor, leading to substantial improvements in quantitative benchmarks and human assessments.
no code implementations • 22 Feb 2024 • Willi Menapace, Aliaksandr Siarohin, Ivan Skorokhodov, Ekaterina Deyneka, Tsai-Shien Chen, Anil Kag, Yuwei Fang, Aleksei Stoliar, Elisa Ricci, Jian Ren, Sergey Tulyakov
Since video content is highly redundant, we argue that naively bringing advances of image models to the video generation domain reduces motion fidelity, visual quality and impairs scalability.
Ranked #1 on Text-to-Video Generation on MSR-VTT
1 code implementation • International Conference on Learning Representations 2023 • Anil Kag, Durmus Alp Emre Acar, Aditya Gangrade, Venkatesh Saligrama
We propose a novel knowledge distillation (KD) method to selectively instill teacher knowledge into a student model motivated by situations where the student's capacity is significantly smaller than that of the teachers.
1 code implementation • International Conference on Learning Representations 2023 • Anil Kag, Igor Fedorov, Aditya Gangrade, Paul Whatmough, Venkatesh Saligrama
Training a hybrid learner is difficult since we lack annotations of hard edge-examples.
1 code implementation • CVPR 2022 • Anil Kag, Venkatesh Saligrama
Convolutional neural networks (CNNs) rely on the depth of the architecture to obtain complex features.
1 code implementation • NeurIPS 2021 • Aditya Gangrade, Anil Kag, Ashok Cutkosky, Venkatesh Saligrama
For example, this may model an adaptive decision to invoke more resources on this instance.
1 code implementation • 29 Sep 2021 • Anil Kag, Igor Fedorov, Aditya Gangrade, Paul Whatmough, Venkatesh Saligrama
The first network is a low-capacity network that can be deployed on an edge device, whereas the second is a high-capacity network deployed in the cloud.
1 code implementation • International Conference on Machine Learning 2021 • Anil Kag, Venkatesh Saligrama
BPTT updates RNN parameters on an instance by back-propagating the error in time over the entire sequence length, and as a result, leads to poor trainability due to the well-known gradient explosion/decay phenomena.
1 code implementation • CVPR 2021 • Anil Kag, Venkatesh Saligrama
We propose a learning method that, dynamically modifies the time-constants of the continuous-time counterpart of a vanilla RNN.
1 code implementation • 15 Oct 2020 • Aditya Gangrade, Anil Kag, Venkatesh Saligrama
We propose a novel method for selective classification (SC), a problem which allows a classifier to abstain from predicting some instances, thus trading off accuracy against coverage (the fraction of instances predicted).
1 code implementation • ICLR 2020 • Anil Kag, Ziming Zhang, Venkatesh Saligrama
Recurrent neural networks (RNNs) are particularly well-suited for modeling long-term dependencies in sequential data, but are notoriously hard to train because the error backpropagated in time either vanishes or explodes at an exponential rate.
no code implementations • 22 Aug 2019 • Anil Kag, Ziming Zhang, Venkatesh Saligrama
Recurrent neural networks (RNNs) are particularly well-suited for modeling long-term dependencies in sequential data, but are notoriously hard to train because the error backpropagated in time either vanishes or explodes at an exponential rate.
no code implementations • 2 Mar 2019 • Ziming Zhang, Anil Kag, Alan Sullivan, Venkatesh Saligrama
We show that such self-feedback helps stabilize the hidden state transitions leading to fast convergence during training while efficiently learning discriminative latent features that result in state-of-the-art results on several benchmark datasets at test-time.
no code implementations • NIPS Workshop CDNNRIA 2018 • Sivaramakrishnan Sankarapandian, Anil Kag, Rachel Manzelli, Brian Kulis
We describe a training strategy that grows the number of units during training, and show on several benchmark datasets that our model yields architectures that are smaller than those obtained when tuning the number of hidden units on a standard fixed architecture.