Search Results for author: Xixin Wu

Found 40 papers, 12 papers with code

Grounded Dialogue Generation with Cross-encoding Re-ranker, Grounding Span Prediction, and Passage Dropout

no code implementations • dialdoc (ACL) 2022 • Kun Li, Tianhua Zhang, Liping Tang, Junan Li, Hongyuan Lu, Xixin Wu, Helen Meng

For the response generator, we use grounding span prediction as an auxiliary task to be jointly trained with the main task of response generation.

Dialogue Generation Response Generation

Paper
Add Code

CLAPSep: Leveraging Contrastive Pre-trained Model for Multi-Modal Query-Conditioned Target Sound Extraction

1 code implementation • 27 Feb 2024 • Hao Ma, Zhiyuan Peng, Xu Li, Mingjie Shao, Xixin Wu, Ju Liu

Universal sound separation (USS) aims to extract arbitrary types of sounds from real-world recordings.

Ranked #1 on Target Sound Extraction on AudioSet

Target Sound Extraction

Paper
Code

UNIT-DSR: Dysarthric Speech Reconstruction System Using Speech Unit Normalization

no code implementations • 26 Jan 2024 • Yuejiao Wang, Xixin Wu, Disong Wang, Lingwei Meng, Helen Meng

Dysarthric speech reconstruction (DSR) systems aim to automatically convert dysarthric speech into normal-sounding speech.

Decoder Domain Adaptation +2

Paper
Add Code

Cross-Speaker Encoding Network for Multi-Talker Speech Recognition

no code implementations • 8 Jan 2024 • Jiawen Kang, Lingwei Meng, Mingyu Cui, Haohan Guo, Xixin Wu, Xunying Liu, Helen Meng

To the best of our knowledge, this work represents an early effort to integrate SIMO and SISO for multi-talker speech recognition.

Decoder speech-recognition +1

Paper
Add Code

SimCalib: Graph Neural Network Calibration based on Similarity between Nodes

no code implementations • 19 Dec 2023 • Boshi Tang, Zhiyong Wu, Xixin Wu, Qiaochu Huang, Jun Chen, Shun Lei, Helen Meng

A novel calibration framework, named SimCalib, is accordingly proposed to consider similarity between nodes at global and local levels.

Paper
Add Code

StyleSpeech: Self-supervised Style Enhancing with VQ-VAE-based Pre-training for Expressive Audiobook Speech Synthesis

no code implementations • 19 Dec 2023 • Xueyuan Chen, Xi Wang, Shaofei Zhang, Lei He, Zhiyong Wu, Xixin Wu, Helen Meng

Both objective and subjective evaluations demonstrate that our proposed method can effectively improve the naturalness and expressiveness of the synthesized speech in audiobook synthesis especially for the role and out-of-domain scenarios.

Decoder Speech Synthesis

Paper
Add Code

Injecting linguistic knowledge into BERT for Dialogue State Tracking

no code implementations • 27 Nov 2023 • Xiaohan Feng, Xixin Wu, Helen Meng

This correlation facilitates a comprehensive understanding of the linguistic features influencing the DST model's decision-making process.

Decision Making Dialogue State Tracking

Paper
Add Code

Natural Language Embedded Programs for Hybrid Language Symbolic Reasoning

1 code implementation • 19 Sep 2023 • Tianhua Zhang, Jiaxin Ge, Hongyin Luo, Yung-Sung Chuang, Mingye Gao, Yuan Gong, Xixin Wu, Yoon Kim, Helen Meng, James Glass

How can we perform computations over natural language representations to solve tasks that require symbolic and numeric reasoning?

Instruction Following Language Modelling +5

Paper
Code

Improving Mandarin Prosodic Structure Prediction with Multi-level Contextual Information

no code implementations • 31 Aug 2023 • Jie Chen, Changhe Song, Deyi Tuo, Xixin Wu, Shiyin Kang, Zhiyong Wu, Helen Meng

For text-to-speech (TTS) synthesis, prosodic structure prediction (PSP) plays an important role in producing natural and intelligible speech.

Decoder Multi-Task Learning

Paper
Add Code

QS-TTS: Towards Semi-Supervised Text-to-Speech Synthesis via Vector-Quantized Self-Supervised Speech Representation Learning

1 code implementation • 31 Aug 2023 • Haohan Guo, Fenglong Xie, Jiawen Kang, Yujia Xiao, Xixin Wu, Helen Meng

This paper proposes a novel semi-supervised TTS framework, QS-TTS, to improve TTS quality with lower supervised data requirements via Vector-Quantized Self-Supervised Speech Representation Learning (VQ-S3RL) utilizing more unlabeled speech audio.

Representation Learning Speech Synthesis +2

156

Paper
Code

Rethinking Machine Ethics -- Can LLMs Perform Moral Reasoning through the Lens of Moral Theories?

no code implementations • 29 Aug 2023 • Jingyan Zhou, Minda Hu, Junan Li, Xiaoying Zhang, Xixin Wu, Irwin King, Helen Meng

Our analysis exhibits the potentials and flaws in existing resources (models and datasets) in developing explainable moral judgment-making systems.

Ethics

Paper
Add Code

Unified Modeling of Multi-Talker Overlapped Speech Recognition and Diarization with a Sidecar Separator

no code implementations • 25 May 2023 • Lingwei Meng, Jiawen Kang, Mingyu Cui, Haibin Wu, Xixin Wu, Helen Meng

Extending on this, we incorporate a diarization branch into the Sidecar, allowing for unified modeling of both ASR and diarization with a negligible overhead of only 768 parameters.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

SAIL: Search-Augmented Instruction Learning

no code implementations • 24 May 2023 • Hongyin Luo, Yung-Sung Chuang, Yuan Gong, Tianhua Zhang, Yoon Kim, Xixin Wu, Danny Fox, Helen Meng, James Glass

Large language models (LLMs) have been significantly improved by instruction fine-tuning, but still lack transparency and the ability to utilize up-to-date knowledge and information.

Denoising Fact Checking +3

Paper
Add Code

Interpretable Unified Language Checking

1 code implementation • 7 Apr 2023 • Tianhua Zhang, Hongyin Luo, Yung-Sung Chuang, Wei Fang, Luc Gaitskell, Thomas Hartvigsen, Xixin Wu, Danny Fox, Helen Meng, James Glass

Despite recent concerns about undesirable behaviors generated by large language models (LLMs), including non-factual, biased, and hateful language, we find LLMs are inherent multi-task language checkers based on their latent representations of natural and social knowledge.

Fact Checking Fairness +2

Paper
Code

A Hierarchical Regression Chain Framework for Affective Vocal Burst Recognition

1 code implementation • 14 Mar 2023 • Jinchao Li, Xixin Wu, Kaitao Song, Dongsheng Li, Xunying Liu, Helen Meng

Experimental results based on the ACII Challenge 2022 dataset demonstrate the superior performance of the proposed system and the effectiveness of considering multiple relationships using hierarchical regression chain models.

Ranked #1 on Vocal Bursts Intensity Prediction on HUME-VB

A-VB Culture A-VB High +6

Paper
Code

Leveraging Pretrained Representations with Task-related Keywords for Alzheimer's Disease Detection

no code implementations • 14 Mar 2023 • Jinchao Li, Kaitao Song, Junan Li, Bo Zheng, Dongsheng Li, Xixin Wu, Xunying Liu, Helen Meng

This paper presents several efficient methods to extract better AD-related cues from high-level acoustic and linguistic features.

Alzheimer's Disease Detection Binary Classification

Paper
Add Code

A Sidecar Separator Can Convert a Single-Talker Speech Recognition System to a Multi-Talker One

no code implementations • 20 Feb 2023 • Lingwei Meng, Jiawen Kang, Mingyu Cui, Yuejiao Wang, Xixin Wu, Helen Meng

Although automatic speech recognition (ASR) can perform well in common non-overlapping environments, sustaining performance in multi-talker overlapping speech recognition remains challenging.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improving Rare Words Recognition through Homophone Extension and Unified Writing for Low-resource Cantonese Speech Recognition

no code implementations • 2 Feb 2023 • Holam Chung, Junan Li, Pengfei Liu1, Wai-Kim Leung, Xixin Wu, Helen Meng

Homophone characters are common in tonal syllable-based languages, such as Mandarin and Cantonese.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Towards High-Quality Neural TTS for Low-Resource Languages by Learning Compact Speech Representations

1 code implementation • 27 Oct 2022 • Haohan Guo, Fenglong Xie, Xixin Wu, Hui Lu, Helen Meng

Moreover, we optimize the training strategy by leveraging more audio to learn MSMCRs better for low-resource languages.

Transfer Learning

156

Paper
Code

Disentangled Speech Representation Learning for One-Shot Cross-lingual Voice Conversion Using $β$-VAE

no code implementations • 25 Oct 2022 • Hui Lu, Disong Wang, Xixin Wu, Zhiyong Wu, Xunying Liu, Helen Meng

We propose an unsupervised learning method to disentangle speech into content representation and speaker identity representation.

Disentanglement Voice Conversion

Paper
Add Code

A Multi-Stage Multi-Codebook VQ-VAE Approach to High-Performance Neural TTS

1 code implementation • 22 Sep 2022 • Haohan Guo, Fenglong Xie, Frank K. Soong, Xixin Wu, Helen Meng

A vector-quantized, variational autoencoder (VQ-VAE) based feature analyzer is used to encode Mel spectrograms of speech training data by down-sampling progressively in multiple stages into MSMC Representations (MSMCRs) with different time resolutions, and quantizing them with multiple VQ codebooks, respectively.

156

Paper
Code

Exploring linguistic feature and model combination for speech recognition based automatic AD detection

no code implementations • 28 Jun 2022 • Yi Wang, Tianzi Wang, Zi Ye, Lingwei Meng, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng

Early diagnosis of Alzheimer's disease (AD) is crucial in facilitating preventive care and delay progression.

Model Selection speech-recognition +1

Paper
Add Code

Tackling Spoofing-Aware Speaker Verification with Multi-Model Fusion

no code implementations • 18 Jun 2022 • Haibin Wu, Jiawen Kang, Lingwei Meng, Yang Zhang, Xixin Wu, Zhiyong Wu, Hung-Yi Lee, Helen Meng

However, previous works show that state-of-the-art ASV models are seriously vulnerable to voice spoofing attacks, and the recently proposed high-performance spoofing countermeasure (CM) models only focus solely on the standalone anti-spoofing tasks, and ignore the subsequent speaker verification process.

Open-Ended Question Answering Speaker Verification

Paper
Add Code

Neural Architecture Search for Speech Emotion Recognition

no code implementations • 31 Mar 2022 • Xixin Wu, Shoukang Hu, Zhiyong Wu, Xunying Liu, Helen Meng

Deep neural networks have brought significant advancements to speech emotion recognition (SER).

Neural Architecture Search Speech Emotion Recognition

Paper
Add Code

Spoofing-Aware Speaker Verification by Multi-Level Fusion

no code implementations • 29 Mar 2022 • Haibin Wu, Lingwei Meng, Jiawen Kang, Jinchao Li, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng

In the second-level fusion, the CM score and ASV scores directly from ASV systems will be concatenated into a prediction block for the final decision.

Speaker Verification

Paper
Add Code

Estimating the Uncertainty in Emotion Class Labels with Utterance-Specific Dirichlet Priors

no code implementations • 8 Mar 2022 • Wen Wu, Chao Zhang, Xixin Wu, Philip C. Woodland

In this paper, a novel Bayesian training loss based on per-utterance Dirichlet prior distributions is proposed for verbal emotion recognition, which models the uncertainty in one-hot labels created when human annotators assign the same utterance to different emotion classes.

Attribute Emotion Classification +1

Paper
Add Code

Speaker Identity Preservation in Dysarthric Speech Reconstruction by Adversarial Speaker Adaptation

no code implementations • 18 Feb 2022 • Disong Wang, Songxiang Liu, Xixin Wu, Hui Lu, Lifa Sun, Xunying Liu, Helen Meng

The primary task of ASA fine-tunes the SE with the speech of the target dysarthric speaker to effectively capture identity-related information, and the secondary task applies adversarial training to avoid the incorporation of abnormal speaking patterns into the reconstructed speech, by regularizing the distribution of reconstructed speech to be close to that of reference speech with high quality.

Multi-Task Learning Speaker Verification

Paper
Add Code

The CUHK-TENCENT speaker diarization system for the ICASSP 2022 multi-channel multi-party meeting transcription challenge

no code implementations • 4 Feb 2022 • Naijun Zheng, Na Li, Xixin Wu, Lingwei Meng, Jiawen Kang, Haibin Wu, Chao Weng, Dan Su, Helen Meng

This paper describes our speaker diarization system submitted to the Multi-channel Multi-party Meeting Transcription (M2MeT) challenge, where Mandarin meeting data were recorded in multi-channel format for diarization and automatic speech recognition (ASR) tasks.

Action Detection Activity Detection +6

Paper
Add Code

Characterizing the adversarial vulnerability of speech self-supervised learning

no code implementations • 8 Nov 2021 • Haibin Wu, Bo Zheng, Xu Li, Xixin Wu, Hung-Yi Lee, Helen Meng

As the paradigm of the self-supervised learning upstream model followed by downstream tasks arouses more attention in the speech community, characterizing the adversarial robustness of such paradigm is of high priority.

Adversarial Robustness Benchmarking +2

Paper
Add Code

Channel-wise Gated Res2Net: Towards Robust Detection of Synthetic Speech Attacks

2 code implementations • 19 Jul 2021 • Xu Li, Xixin Wu, Hui Lu, Xunying Liu, Helen Meng

This argument motivates the current work that presents a novel, channel-wise gated Res2Net (CG-Res2Net), which modifies Res2Net to enable a channel-wise gating mechanism in the connection between feature groups.

Speaker Verification

Paper
Code

Attention Forcing for Machine Translation

1 code implementation • 2 Apr 2021 • Qingyun Dou, Yiting Lu, Potsawee Manakul, Xixin Wu, Mark J. F. Gales

This approach guides the model with the generated output history and reference attention, and can reduce the training-inference mismatch without a schedule or a classifier.

Machine Translation NMT +1

Paper
Code

Should Ensemble Members Be Calibrated?

no code implementations • 13 Jan 2021 • Xixin Wu, Mark Gales

It is shown that well calibrated ensemble members will not necessarily yield a well calibrated ensemble prediction, and if the ensemble prediction is well calibrated its performance cannot exceed that of the average performance of the calibrated ensemble members.

Image Classification

Paper
Add Code

Learning Explicit Prosody Models and Deep Speaker Embeddings for Atypical Voice Conversion

no code implementations • 3 Nov 2020 • Disong Wang, Songxiang Liu, Lifa Sun, Xixin Wu, Xunying Liu, Helen Meng

Third, a conversion model takes phoneme embeddings and typical prosody features as inputs to generate the converted speech, conditioned on the target DSE that is learned via speaker encoder or speaker adaptation.

speech-recognition Speech Recognition +1

Paper
Add Code

Any-to-Many Voice Conversion with Location-Relative Sequence-to-Sequence Modeling

1 code implementation • 6 Sep 2020 • Songxiang Liu, Yuewen Cao, Disong Wang, Xixin Wu, Xunying Liu, Helen Meng

During the training stage, an encoder-decoder-based hybrid connectionist-temporal-classification-attention (CTC-attention) phoneme recognizer is trained, whose encoder has a bottle-neck layer.

feature selection speech-recognition +2

313

Paper
Code

Investigating Robustness of Adversarial Samples Detection for Automatic Speaker Verification

no code implementations • 11 Jun 2020 • Xu Li, Na Li, Jinghua Zhong, Xixin Wu, Xunying Liu, Dan Su, Dong Yu, Helen Meng

Orthogonal to prior approaches, this work proposes to defend ASV systems against adversarial attacks with a separate detection network, rather than augmenting adversarial data into ASV training.

Binary Classification Data Augmentation +1

Paper
Add Code

Bayesian x-vector: Bayesian Neural Network based x-vector System for Speaker Verification

no code implementations • 8 Apr 2020 • Xu Li, Jinghua Zhong, Jianwei Yu, Shoukang Hu, Xixin Wu, Xunying Liu, Helen Meng

Our experiment results indicate that the DNN x-vector system could benefit from BNNs especially when the mismatch problem is severe for evaluations using out-of-domain data.

Speaker Verification

Paper
Add Code

Deep segmental phonetic posterior-grams based discovery of non-categories in L2 English speech

no code implementations • 1 Feb 2020 • Xu Li, Xixin Wu, Xunying Liu, Helen Meng

And then we explore the non-categories by looking for the SPPGs with more than one peak.

Paper
Add Code

Adversarial Attacks on GMM i-vector based Speaker Verification Systems

2 code implementations • 8 Nov 2019 • Xu Li, Jinghua Zhong, Xixin Wu, Jianwei Yu, Xunying Liu, Helen Meng

Experiment results show that GMM i-vector systems are seriously vulnerable to adversarial attacks, and the crafted adversarial samples prove to be transferable and pose threats to neuralnetwork speaker embedding based systems (e. g. x-vector systems).

Speaker Verification

Paper
Code

Coupling Global and Local Context for Unsupervised Aspect Extraction

no code implementations • IJCNLP 2019 • Ming Liao, Jing Li, Haisong Zhang, Lingzhi Wang, Xixin Wu, Kam-Fai Wong

Aspect words, indicating opinion targets, are essential in expressing and understanding human opinions.

Aspect Extraction Sentence

Paper
Add Code

Maximizing Mutual Information for Tacotron

2 code implementations • 30 Aug 2019 • Peng Liu, Xixin Wu, Shiyin Kang, Guangzhi Li, Dan Su, Dong Yu

End-to-end speech synthesis methods already achieve close-to-human quality performance.

Attribute Speech Synthesis

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.