no code implementations • 23 May 2024 • Chan-Jan Hsu, Yi-Chang Chen, Feng-Ting Liao, Pei-Chen Ho, Yu-Hsiang Wang, Po-chun Hsu, Da-Shan Shiu
We introduce "Generative Fusion Decoding" (GFD), a novel shallow fusion framework, utilized to integrate Large Language Models (LLMs) into multi-modal text recognition systems such as automatic speech recognition (ASR) and optical character recognition (OCR).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 30 May 2023 • Yu-Hsiang Wang, Huang-Yu Chen, Kai-Wei Chang, Winston Hsu, Hung-Yi Lee
In this paper, we introduce MiniSUPERB, a lightweight benchmark that efficiently evaluates SSL speech models with comparable results to SUPERB but lower computational costs significantly.
2 code implementations • 16 Nov 2022 • Yu-Hsiang Wang, Jun-Wei Hsieh, Ping-Yang Chen, Ming-Ching Chang, Hung Hin So, Xin Li
Second, we develop a Similarity Matching Cascade (SMC) module with a novel GATE function for robust object matching across consecutive video frames, further enhancing MOT performance.
Ranked #1 on Multi-Object Tracking on MOT20 (using extra training data)