no code implementations • 24 May 2023 • Eliya Nachmani, Alon Levkovitch, Roy Hirsch, Julian Salazar, Chulayuth Asawaroengchai, Soroosh Mariooryad, Ehud Rivlin, RJ Skerry-Ryan, Michelle Tadmor Ramanovich
Key to our approach is a training objective that jointly supervises speech recognition, text continuation, and speech synthesis using only paired speech-text pairs, enabling a `cross-modal' chain-of-thought within a single decoding pass.
1 code implementation • 22 May 2023 • Jianfeng He, Julian Salazar, Kaisheng Yao, Haoqi Li, Jinglun Cai
End-to-end (E2E) spoken language understanding (SLU) is constrained by the cost of collecting speech-semantics pairs, especially when label domains change.
Natural Language Understanding Spoken Language Understanding
1 code implementation • 7 Jul 2022 • Zejiang Hou, Julian Salazar, George Polovets
Large pretrained language models (PLMs) are often domain- or task-adapted via fine-tuning or prompting.
no code implementations • NAACL 2021 • Ethan A. Chi, Julian Salazar, Katrin Kirchhoff
Non-autoregressive models greatly improve decoding speed over typical sequence-to-sequence models, but suffer from degraded performance.
no code implementations • 15 Oct 2020 • Phillip Keung, Julian Salazar, Yichao Lu, Noah A. Smith
We then improve an XLM-based unsupervised neural MT system pre-trained on Wikipedia by supplementing it with pseudo-parallel text mined from the same corpus, boosting unsupervised translation performance by up to 3. 5 BLEU on the WMT'14 French-English and WMT'16 German-English tasks and outperforming the previous state-of-the-art.
no code implementations • EMNLP 2020 • Phillip Keung, Yichao Lu, Julian Salazar, Vikas Bhardwaj
Multilingual contextual embeddings have demonstrated state-of-the-art performance in zero-shot cross-lingual transfer learning, where multilingual BERT is fine-tuned on one source language and evaluated on a different target language.
1 code implementation • 12 Feb 2020 • Phillip Keung, Wei Niu, Yichao Lu, Julian Salazar, Vikas Bhardwaj
We discuss the problem of echographic transcription in autoregressive sequence-to-sequence attentional architectures for automatic speech recognition, where a model produces very long sequences of repetitive outputs when presented with out-of-domain utterances.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 3 Dec 2019 • Shaoshi Ling, Yuzong Liu, Julian Salazar, Katrin Kirchhoff
We propose a novel approach to semi-supervised automatic speech recognition (ASR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
6 code implementations • ACL 2020 • Julian Salazar, Davis Liang, Toan Q. Nguyen, Katrin Kirchhoff
Instead, we evaluate MLMs out of the box via their pseudo-log-likelihood scores (PLLs), which are computed by masking tokens one by one.
3 code implementations • EMNLP (IWSLT) 2019 • Toan Q. Nguyen, Julian Salazar
We evaluate three simple, normalization-centric changes to improve Transformer training.
Ranked #4 on Machine Translation on IWSLT2015 English-Vietnamese
1 code implementation • 30 Jun 2019 • Shaoshi Ling, Julian Salazar, Yuzong Liu, Katrin Kirchhoff
We introduce BERTphone, a Transformer encoder trained on large speech corpora that outputs phonetically-aware contextual representation vectors that can be used for both speaker and language recognition.
1 code implementation • 22 Jan 2019 • Julian Salazar, Katrin Kirchhoff, Zhiheng Huang
The success of self-attention in NLP has led to recent applications in end-to-end encoder-decoder architectures for speech recognition.