Search Results for author: Jakob Drachmann Havtorn

Found 6 papers, 1 papers with code

MSViT: Dynamic Mixed-Scale Tokenization for Vision Transformers

1 code implementation • 5 Jul 2023 • Jakob Drachmann Havtorn, Amelie Royer, Tijmen Blankevoort, Babak Ehteshami Bejnordi

The input tokens to Vision Transformers carry little semantic meaning as they are defined as regular equal-sized patches of the input image, regardless of its content.

Paper
Code

A Brief Overview of Unsupervised Neural Speech Representation Learning

no code implementations • 1 Mar 2022 • Lasse Borgholt, Jakob Drachmann Havtorn, Joakim Edin, Lars Maaløe, Christian Igel

Unsupervised representation learning for speech processing has matured greatly in the last few years.

Representation Learning

Paper
Add Code

Do We Still Need Automatic Speech Recognition for Spoken Language Understanding?

no code implementations • 29 Nov 2021 • Lasse Borgholt, Jakob Drachmann Havtorn, Mostafa Abdou, Joakim Edin, Lars Maaløe, Anders Søgaard, Christian Igel

We compare learned speech features from wav2vec 2. 0, state-of-the-art ASR transcripts, and the ground truth text as input for a novel speech-based named entity recognition task, a cardiac arrest detection task on real-world emergency calls and two existing SLU benchmarks.

Ranked #7 on Spoken Language Understanding on Fluent Speech Commands (using extra training data)

Automatic Speech Recognition Automatic Speech Recognition (ASR) +8

Paper
Add Code

Towards Generative Latent Variable Models for Speech

no code implementations • 29 Sep 2021 • Jakob Drachmann Havtorn, Lasse Borgholt, Jes Frellsen, Søren Hauberg, Lars Maaløe

While stochastic latent variable models (LVMs) now achieve state-of-the-art performance on natural image generation, they are still inferior to deterministic models on speech.

Image Generation Video Generation

Paper
Add Code

Do End-to-End Speech Recognition Models Care About Context?

no code implementations • 17 Feb 2021 • Lasse Borgholt, Jakob Drachmann Havtorn, Željko Agić, Anders Søgaard, Lars Maaløe, Christian Igel

We test this hypothesis by measuring temporal context sensitivity and evaluate how the models perform when we constrain the amount of contextual information in the audio input.

Decoder Language Modelling +2

Paper
Add Code

On Scaling Contrastive Representations for Low-Resource Speech Recognition

no code implementations • 1 Feb 2021 • Lasse Borgholt, Tycho Max Sylvester Tax, Jakob Drachmann Havtorn, Lars Maaløe, Christian Igel

We explore the performance of such systems without fine-tuning by training a state-of-the-art speech recognizer on the fixed representations from the computationally demanding wav2vec 2. 0 framework.

Self-Supervised Learning speech-recognition +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.