no code implementations • NAACL 2022 • Zheng Fang, Ruiqing Zhang, Zhongjun He, Hua Wu, Yanan Cao
Automatic Speech Recognition (ASR) is an efficient and widely used input method that transcribes speech signals into text.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • EMNLP 2020 • Ruiqing Zhang, Chuanqiang Zhang, Zhongjun He, Hua Wu, Haifeng Wang
The policy learns to segment the source text by considering possible translations produced by the translation model, maintaining consistency between the segmentation and translation.
no code implementations • ACL 2022 • Ruiqing Zhang, Zhongjun He, Hua Wu, Haifeng Wang
End-to-end simultaneous speech-to-text translation aims to directly perform translation from streaming source speech to target text with high translation quality and low latency.
no code implementations • NAACL (AutoSimTrans) 2021 • Ruiqing Zhang, Chuanqiang Zhang, Zhongjun He, Hua Wu, Haifeng Wang
This paper presents the results of the shared task of the 2nd Workshop on Automatic Simultaneous Translation (AutoSimTrans).
no code implementations • NAACL (AutoSimTrans) 2022 • Ruiqing Zhang, Chuanqiang Zhang, Zhongjun He, Hua Wu, Haifeng Wang, Liang Huang, Qun Liu, Julia Ive, Wolfgang Macherey
This paper reports the results of the shared task we hosted on the Third Workshop of Automatic Simultaneous Translation (AutoSimTrans).
1 code implementation • 11 Jan 2024 • Pengzhi Gao, Zhongjun He, Hua Wu, Haifeng Wang
The training paradigm for machine translation has gradually shifted, from learning neural machine translation (NMT) models with extensive parallel corpora to instruction finetuning on multilingual large language models (LLMs) with high-quality translation pairs.
1 code implementation • 28 Aug 2023 • Pengzhi Gao, Ruiqing Zhang, Zhongjun He, Hua Wu, Haifeng Wang
Consistency regularization methods, such as R-Drop (Liang et al., 2021) and CrossConST (Gao et al., 2023), have achieved impressive supervised and zero-shot performance in the neural machine translation (NMT) field.
1 code implementation • 12 Jun 2023 • Pengzhi Gao, Liwen Zhang, Zhongjun He, Hua Wu, Haifeng Wang
Multilingual sentence representations are the foundation for similarity-based bitext mining, which is crucial for scaling multilingual neural machine translation (NMT) system to more languages.
1 code implementation • 12 May 2023 • Pengzhi Gao, Liwen Zhang, Zhongjun He, Hua Wu, Haifeng Wang
The experimental analysis also proves that CrossConST could close the sentence representation gap and better align the representation space.
1 code implementation • NAACL 2022 • Pengzhi Gao, Zhongjun He, Hua Wu, Haifeng Wang
We introduce Bi-SimCut: a simple but effective training strategy to boost neural machine translation (NMT) performance.
Ranked #1 on Machine Translation on WMT2014 German-English
no code implementations • Findings (EMNLP) 2021 • Jicheng Li, Pengzhi Gao, Xuanfu Wu, Yang Feng, Zhongjun He, Hua Wu, Haifeng Wang
To further improve the faithfulness and diversity of the translations, we propose two simple but effective approaches to select diverse sentence pairs in the training corpus and adjust the interpolation weight for each pair correspondingly.
no code implementations • NAACL (AutoSimTrans) 2021 • Ruiqing Zhang, Xiyang Wang, Chuanqiang Zhang, Zhongjun He, Hua Wu, Zhi Li, Haifeng Wang, Ying Chen, Qinfei Li
This corpus is expected to promote the research of automatic simultaneous translation as well as the development of practical systems.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 1 Jan 2021 • Chenze Shao, Meng Sun, Yang Feng, Zhongjun He, Hua Wu, Haifeng Wang
Under this framework, we introduce word-level ensemble learning and sequence-level ensemble learning for neural machine translation, where sequence-level ensemble learning is capable of aggregating translation models with different decoding strategies.
no code implementations • EMNLP 2020 • Liang Huang, Colin Cherry, Mingbo Ma, Naveen Arivazhagan, Zhongjun He
Simultaneous translation, which performs translation concurrently with the source speech, is widely useful in many scenarios such as international conferences, negotiations, press releases, legal proceedings, and medicine.
1 code implementation • 16 Dec 2019 • Yuchen Liu, Jiajun Zhang, Hao Xiong, Long Zhou, Zhongjun He, Hua Wu, Haifeng Wang, Cheng-qing Zong
Speech-to-text translation (ST), which translates source language speech into target language text, has attracted intensive attention in recent years.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • IJCNLP 2019 • Tianchi Bi, Hao Xiong, Zhongjun He, Hua Wu, Haifeng Wang
Conventional Neural Machine Translation (NMT) models benefit from the training with an additional agent, e. g., dual learning, and bidirectional decoding with one agent decoding from left to right and the other decoding in the opposite direction.
no code implementations • WS 2019 • Meng Sun, Bojian Jiang, Hao Xiong, Zhongjun He, Hua Wu, Haifeng Wang
In this paper we introduce the systems Baidu submitted for the WMT19 shared task on Chinese{\textless}-{\textgreater}English news translation.
no code implementations • 30 Jul 2019 • Hao Xiong, Ruiqing Zhang, Chuanqiang Zhang, Zhongjun He, Hua Wu, Haifeng Wang
In this paper, we present DuTongChuan, a novel context-aware translation model for simultaneous interpreting.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 17 Apr 2019 • Yuchen Liu, Hao Xiong, Zhongjun He, Jiajun Zhang, Hua Wu, Haifeng Wang, Cheng-qing Zong
End-to-end speech translation (ST), which directly translates from source language speech into target language text, has attracted intensive attentions in recent years.
no code implementations • 14 Nov 2018 • Hao Xiong, Zhongjun He, Hua Wu, Haifeng Wang
Discourse coherence plays an important role in the translation of one text.
3 code implementations • ACL 2019 • Mingbo Ma, Liang Huang, Hao Xiong, Renjie Zheng, Kaibo Liu, Baigong Zheng, Chuanqiang Zhang, Zhongjun He, Hairong Liu, Xing Li, Hua Wu, Haifeng Wang
Simultaneous translation, which translates sentences before they are finished, is useful in many scenarios but is notoriously difficult due to word-order differences.
no code implementations • ACL 2019 • Hairong Liu, Mingbo Ma, Liang Huang, Hao Xiong, Zhongjun He
Neural machine translation (NMT) is notoriously sensitive to noises, but noises are almost inevitable in practice.
no code implementations • EMNLP 2018 • Yang Zhao, Jiajun Zhang, Zhongjun He, Cheng-qing Zong, Hua Wu
One of the weaknesses of Neural Machine Translation (NMT) is in handling lowfrequency and ambiguous words, which we refer as troublesome words.
no code implementations • 6 Dec 2017 • Hao Xiong, Zhongjun He, Xiaoguang Hu, Hua Wu
This design of encoder yields relatively uniform composition on source sentence, despite the gating mechanism employed in encoding RNN.
no code implementations • ACL 2016 • Yong Cheng, Wei Xu, Zhongjun He, wei he, Hua Wu, Maosong Sun, Yang Liu
While end-to-end neural machine translation (NMT) has made remarkable progress recently, NMT systems only rely on parallel corpora for parameter estimation.
1 code implementation • 15 Dec 2015 • Yong Cheng, Shiqi Shen, Zhongjun He, wei he, Hua Wu, Maosong Sun, Yang Liu
The attentional mechanism has proven to be effective in improving end-to-end neural machine translation.
1 code implementation • ACL 2016 • Shiqi Shen, Yong Cheng, Zhongjun He, wei he, Hua Wu, Maosong Sun, Yang Liu
We propose minimum risk training for end-to-end neural machine translation.