Search Results for author: Ashish Seth

Found 12 papers, 7 papers with code

Stable Distillation: Regularizing Continued Pre-training for Low-Resource Automatic Speech Recognition

1 code implementation • 20 Dec 2023 • Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

Specifically, first, we perform vanilla continued pre-training on an initial SSL pre-trained model on the target domain ASR dataset and call it the teacher.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

FusDom: Combining In-Domain and Out-of-Domain Knowledge for Continuous Self-Supervised Learning

1 code implementation • 20 Dec 2023 • Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

Continued pre-training (CP) offers multiple advantages, like target domain adaptation and the potential to exploit the continuous stream of unlabeled data available online.

Domain Adaptation Self-Supervised Learning

Paper
Code

CompA: Addressing the Gap in Compositional Reasoning in Audio-Language Models

no code implementations • 12 Oct 2023 • Sreyan Ghosh, Ashish Seth, Sonal Kumar, Utkarsh Tyagi, Chandra Kiran Evuru, S. Ramaneswaran, S. Sakshi, Oriol Nieto, Ramani Duraiswami, Dinesh Manocha

In this paper, we propose CompA, a collection of two expert-annotated benchmarks with a majority of real-world audio samples, to evaluate compositional reasoning in ALMs.

Attribute Audio Classification +1

Paper
Add Code

DeAR: Debiasing Vision-Language Models with Additive Residuals

no code implementations • CVPR 2023 • Ashish Seth, Mayur Hemani, Chirag Agarwal

These biases manifest as the skewed similarity between the representations for specific text concepts and images of people of different identity groups and, therefore, limit the usefulness of such models in real-world high-stakes applications.

Attribute Benchmarking +2

Paper
Add Code

UNFUSED: UNsupervised Finetuning Using SElf supervised Distillation

1 code implementation • 10 Mar 2023 • Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

Unlike prior works, which directly fine-tune a self-supervised pre-trained encoder on a target dataset, we use the encoder to generate pseudo-labels for unsupervised fine-tuning before the actual fine-tuning step.

Audio Classification Self-Supervised Learning

Paper
Code

SLICER: Learning universal audio representations using low-resource self-supervised pre-training

1 code implementation • 2 Nov 2022 • Ashish Seth, Sreyan Ghosh, S. Umesh, Dinesh Manocha

We present a new Self-Supervised Learning (SSL) approach to pre-train encoders on unlabeled audio data that reduces the need for large amounts of labeled data for audio and speech classification.

Audio Classification Clustering +3

Paper
Code

MAST: Multiscale Audio Spectrogram Transformers

1 code implementation • 2 Nov 2022 • Sreyan Ghosh, Ashish Seth, S. Umesh, Dinesh Manocha

We present Multiscale Audio Spectrogram Transformer (MAST) for audio classification, which brings the concept of multiscale feature hierarchies to the Audio Spectrogram Transformer (AST).

Audio Classification Keyword Spotting +1

Paper
Code

Technology Pipeline for Large Scale Cross-Lingual Dubbing of Lecture Videos into Multiple Indian Languages

no code implementations • 1 Nov 2022 • Anusha Prakash, Arun Kumar, Ashish Seth, Bhagyashree Mukherjee, Ishika Gupta, Jom Kuriakose, Jordan Fernandes, K V Vikram, Mano Ranjith Kumar M, Metilda Sagaya Mary, Mohammad Wajahat, Mohana N, Mudit Batra, Navina K, Nihal John George, Nithya Ravi, Pruthwik Mishra, Sudhanshu Srivastava, Vasista Sai Lodagala, Vandan Mujadia, Kada Sai Venkata Vineeth, Vrunda Sukhadia, Dipti Sharma, Hema Murthy, Pushpak Bhattacharya, S Umesh, Rajeev Sangal

Cross-lingual dubbing of lecture videos requires the transcription of the original audio, correction and removal of disfluencies, domain term discovery, text-to-text translation into the target language, chunking of text using target language rhythm, text-to-speech synthesis followed by isochronous lipsyncing to the original video.

Chunking Speech Synthesis +1

Paper
Add Code

Analyzing the factors affecting usefulness of Self-Supervised Pre-trained Representations for Speech Recognition

no code implementations • 31 Mar 2022 • Ashish Seth, Lodagala V S V Durga Prasad, Sreyan Ghosh, S. Umesh

Self-supervised learning (SSL) to learn high-level speech representations has been a popular approach to building Automatic Speech Recognition (ASR) systems in low-resource settings.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

DeLoRes: Decorrelating Latent Spaces for Low-Resource Audio Representation Learning

1 code implementation • 25 Mar 2022 • Sreyan Ghosh, Ashish Seth, and Deepak Mittal, Maneesh Singh, S. Umesh

Inspired by the recent progress in self-supervised learning for computer vision, in this paper we introduce DeLoRes, a new general-purpose audio representation learning approach.

Representation Learning Self-Supervised Learning +1

Paper
Code

DECAR: Deep Clustering for learning general-purpose Audio Representations

1 code implementation • 17 Oct 2021 • Sreyan Ghosh, Sandesh V Katta, Ashish Seth, S. Umesh

We introduce DECAR, a self-supervised pre-training approach for learning general-purpose audio representations.

Clustering Deep Clustering +2

Paper
Code

Dual Script E2E framework for Multilingual and Code-Switching ASR

no code implementations • 2 Jun 2021 • Mari Ganesh Kumar, Jom Kuriakose, Anand Thyagachandran, Arun Kumar A, Ashish Seth, Lodagala Durga Prasad, Saish Jaiswal, Anusha Prakash, Hema Murthy

In the first system, the E2E model is trained on the CLS representation, and we use a novel data-driven back-end to recover the native language script.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.