no code implementations • 23 May 2023 • Wenbiao Yin, Zhicheng Liu, Chengqi Zhao, Tao Wang, Jian Tong, Rong Ye
To tackle these gaps, we propose \textbf{F}use-\textbf{S}peech-\textbf{T}ext (\textbf{FST}), a cross-modal model which supports three distinct input modalities for translation: speech, text, and fused speech-text.
1 code implementation • ACL (IWSLT) 2021 • Chengqi Zhao, Zhicheng Liu, Jian Tong, Tao Wang, Mingxuan Wang, Rong Ye, Qianqian Dong, Jun Cao, Lei LI
For offline speech translation, our best end-to-end model achieves 8. 1 BLEU improvements over the benchmark on the MuST-C test set and is even approaching the results of a strong cascade solution.