Search Results for author: Haizhou Li

Found 189 papers, 76 papers with code

Mamba in Speech: Towards an Alternative to Self-Attention

no code implementations • 21 May 2024 • Xiangyu Zhang, Qiquan Zhang, Hexin Liu, Tianyi Xiao, Xinyuan Qian, Beena Ahmed, Eliathamby Ambikairajah, Haizhou Li, Julien Epps

Moreover, experiments demonstrate the effectiveness of BiMamba as an alternative to the self-attention module in Transformer and its derivates, particularly for the semantic-aware task.

Paper
Add Code

Incorporating External Knowledge and Goal Guidance for LLM-based Conversational Recommender Systems

no code implementations • 3 May 2024 • Chuang Li, Yang Deng, Hengchang Hu, Min-Yen Kan, Haizhou Li

This paper aims to efficiently enable large language models (LLMs) to use external knowledge and goal guidance in conversational recommender system (CRS) tasks.

Informativeness Recommendation Systems

Paper
Add Code

Audio-Visual Target Speaker Extraction with Reverse Selective Auditory Attention

no code implementations • 29 Apr 2024 • Ruijie Tao, Xinyuan Qian, Yidi Jiang, Junjie Li, Jiadong Wang, Haizhou Li

To this end, we propose a novel reverse selective auditory attention mechanism, which can suppress interference speakers and non-speech signals to avoid incorrect speaker extraction.

Target Speaker Extraction

Paper
Add Code

Enhancing Real-World Active Speaker Detection with Multi-Modal Extraction Pre-Training

no code implementations • 1 Apr 2024 • Ruijie Tao, Xinyuan Qian, Rohan Kumar Das, Xiaoxue Gao, Jiadong Wang, Haizhou Li

Audio-visual active speaker detection (AV-ASD) aims to identify which visible face is speaking in a scene with one or more persons.

Audio-Visual Active Speaker Detection Denoising +1

Paper
Add Code

Voice Conversion Augmentation for Speaker Recognition on Defective Datasets

no code implementations • 1 Apr 2024 • Ruijie Tao, Zhan Shi, Yidi Jiang, Tianchi Liu, Haizhou Li

Our experimental results on three created datasets demonstrated that VCA-NN effectively mitigates these dataset problems, which provides a new direction for handling the speaker recognition problems from the data aspect.

Speaker Recognition Voice Conversion

Paper
Add Code

CrossTune: Black-Box Few-Shot Classification with Label Enhancement

no code implementations • 19 Mar 2024 • Danqing Luo, Chen Zhang, Yan Zhang, Haizhou Li

Training or finetuning large-scale language models (LLMs) requires substantial computation resources, motivating recent efforts to explore parameter-efficient adaptation to downstream tasks.

Few-Shot Text Classification In-Context Learning +2

Paper
Add Code

sVAD: A Robust, Low-Power, and Light-Weight Voice Activity Detection with Spiking Neural Networks

no code implementations • 9 Mar 2024 • Qu Yang, Qianhui Liu, Nan Li, Meng Ge, Zeyang Song, Haizhou Li

Spiking Neural Networks (SNNs) are known to be biologically plausible and power-efficient.

Action Detection Activity Detection

Paper
Add Code

Apollo: An Lightweight Multilingual Medical LLM towards Democratizing Medical AI to 6B People

1 code implementation • 6 Mar 2024 • Xidong Wang, Nuo Chen, Junyin Chen, Yan Hu, Yidong Wang, Xiangbo Wu, Anningzhe Gao, Xiang Wan, Haizhou Li, Benyou Wang

Despite the vast repository of global medical knowledge predominantly being in English, local languages are crucial for delivering tailored healthcare services, particularly in areas with limited medical resources.

138

Paper
Code

Event-Driven Learning for Spiking Neural Networks

no code implementations • 1 Mar 2024 • Wenjie Wei, Malu Zhang, Jilin Zhang, Ammar Belatreche, Jibin Wu, Zijing Xu, Xuerui Qiu, Hong Chen, Yang Yang, Haizhou Li

Specifically, we introduce two novel event-driven learning methods: the spike-timing-dependent event-driven (STD-ED) and membrane-potential-dependent event-driven (MPD-ED) algorithms.

Paper
Add Code

Text-guided HuBERT: Self-Supervised Speech Pre-training via Generative Adversarial Networks

no code implementations • 24 Feb 2024 • Duo Ma, Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

In this paper, we investigate a new way to pre-train such a joint speech-text model to learn enhanced speech representations and benefit various speech-related downstream tasks.

Pseudo Label Self-Supervised Learning

Paper
Add Code

Computation and Parameter Efficient Multi-Modal Fusion Transformer for Cued Speech Recognition

no code implementations • 31 Jan 2024 • Lei Liu, Li Liu, Haizhou Li

Cued Speech (CS) is a pure visual coding method used by hearing-impaired people that combines lip reading with several specific hand shapes to make the spoken language visible.

Lip Reading speech-recognition +1

Paper
Add Code

LitE-SNN: Designing Lightweight and Efficient Spiking Neural Network through Spatial-Temporal Compressive Network Search and Joint Optimization

no code implementations • 26 Jan 2024 • Qianhui Liu, Jiaqi Yan, Malu Zhang, Gang Pan, Haizhou Li

Spiking Neural Networks (SNNs) mimic the information-processing mechanisms of the human brain and are highly energy-efficient, making them well-suited for low-power edge devices.

Quantization

Paper
Add Code

CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing

no code implementations • 22 Jan 2024 • Xianghu Yue, Xiaohai Tian, Lu Lu, Malu Zhang, Zhizheng Wu, Haizhou Li

To bridge the gap between modalities, CoAVT employs a query encoder, which contains a set of learnable query embeddings, and extracts the most informative audiovisual features of the corresponding text.

AudioCaps Audio-Visual Synchronization +4

Paper
Add Code

An Empirical Study on the Impact of Positional Encoding in Transformer-based Monaural Speech Enhancement

no code implementations • 18 Jan 2024 • Qiquan Zhang, Meng Ge, Hongxu Zhu, Eliathamby Ambikairajah, Qi Song, Zhaoheng Ni, Haizhou Li

Transformer architecture has enabled recent progress in speech enhancement.

POS Position +1

Paper
Add Code

Bridging Research and Readers: A Multi-Modal Automated Academic Papers Interpretation System

1 code implementation • 17 Jan 2024 • Feng Jiang, Kuang Wang, Haizhou Li

In the contemporary information era, significantly accelerated by the advent of Large-scale Language Models, the proliferation of scientific literature is reaching unprecedented levels.

Paper
Code

The NUS-HLT System for ICASSP2024 ICMC-ASR Grand Challenge

no code implementations • 26 Dec 2023 • Meng Ge, Yizhou Peng, Yidi Jiang, Jingru Lin, Junyi Ao, Mehmet Sinan Yildirim, Shuai Wang, Haizhou Li, Mengling Feng

This paper summarizes our team's efforts in both tracks of the ICMC-ASR Challenge for in-car multi-channel automatic speech recognition.

Automatic Speech Recognition Data Augmentation +2

Paper
Add Code

A Comprehensive Analysis of the Effectiveness of Large Language Models as Automatic Dialogue Evaluators

1 code implementation • 24 Dec 2023 • Chen Zhang, Luis Fernando D'Haro, Yiming Chen, Malu Zhang, Haizhou Li

Yet, existing works on utilizing LLMs for automatic dialogue evaluation are limited in their scope in terms of the number of meta-evaluation datasets, mode of evaluation, coverage of LLMs, etc.

Dialogue Evaluation

Paper
Code

Emotion Rendering for Conversational Speech Synthesis with Heterogeneous Graph-Based Context Modeling

1 code implementation • 19 Dec 2023 • Rui Liu, Yifan Hu, Yi Ren, Xiang Yin, Haizhou Li

Conversational Speech Synthesis (CSS) aims to accurately express an utterance with the appropriate prosody and emotional inflection within a conversational setting.

Contrastive Learning Speech Synthesis

Paper
Code

Golden Gemini is All You Need: Finding the Sweet Spots for Speaker Verification

1 code implementation • 6 Dec 2023 • Tianchi Liu, Kong Aik Lee, Qiongqiong Wang, Haizhou Li

We represent the stride space on a trellis diagram, and conduct a systematic study on the impact of temporal and frequency resolutions on the performance and further identify two optimal points, namely Golden Gemini, which serves as a guiding principle for designing 2D ResNet-based speaker verification models.

Speaker Verification

566

Paper
Code

HuatuoGPT-II, One-stage Training for Medical Adaption of LLMs

1 code implementation • 16 Nov 2023 • Junying Chen, Xidong Wang, Anningzhe Gao, Feng Jiang, Shunian Chen, Hongbo Zhang, Dingjie Song, Wenya Xie, Chuyi Kong, Jianquan Li, Xiang Wan, Haizhou Li, Benyou Wang

We validate the new protocol in the domains where proprietary LLMs like ChatGPT perform relatively poorly, such as Traditional Chinese Medicine.

Domain Adaptation Language Modelling

260

Paper
Code

How Well Do Text Embedding Models Understand Syntax?

1 code implementation • 14 Nov 2023 • Yan Zhang, Zhaopeng Feng, Zhiyang Teng, Zuozhu Liu, Haizhou Li

Text embedding models have significantly contributed to advancements in natural language processing by adeptly capturing semantic properties of textual data.

Paper
Code

Selective HuBERT: Self-Supervised Pre-Training for Target Speaker in Clean and Mixture Speech

no code implementations • 8 Nov 2023 • Jingru Lin, Meng Ge, Wupeng Wang, Haizhou Li, Mengling Feng

Self-supervised pre-trained speech models were shown effective for various downstream speech processing tasks.

Paper
Add Code

LC-TTFS: Towards Lossless Network Conversion for Spiking Neural Networks with TTFS Coding

no code implementations • 23 Oct 2023 • Qu Yang, Malu Zhang, Jibin Wu, Kay Chen Tan, Haizhou Li

With TTFS coding, we can achieve up to orders of magnitude saving in computation over ANN and other rate-based SNNs.

Edge-computing Image Classification +2

Paper
Add Code

Prompt-driven Target Speech Diarization

no code implementations • 23 Oct 2023 • Yidi Jiang, Zhengyang Chen, Ruijie Tao, Liqun Deng, Yanmin Qian, Haizhou Li

We introduce a novel task named `target speech diarization', which seeks to determine `when target event occurred' within an audio signal.

Action Detection Activity Detection

Paper
Add Code

Quantifying Self-diagnostic Atomic Knowledge in Chinese Medical Foundation Model: A Computational Analysis

1 code implementation • 18 Oct 2023 • Yaxin Fan, Feng Jiang, Benyou Wang, Peifeng Li, Haizhou Li

Recent studies primarily focused on the quality of FMs evaluated by GPT-4 or their ability to pass medical exams, no studies have quantified the extent of self-diagnostic atomic knowledge stored in FMs' memory, which is the basis of foundation models to provide factual and reliable suggestions.

Instruction Following

Paper
Code

LocSelect: Target Speaker Localization with an Auditory Selective Hearing Mechanism

no code implementations • 16 Oct 2023 • Yu Chen, Xinyuan Qian, Zexu Pan, Kainan Chen, Haizhou Li

The prevailing noise-resistant and reverberation-resistant localization algorithms primarily emphasize separating and providing directional output for each speaker in multi-speaker scenarios, without association with the identity of speakers.

Paper
Add Code

UNO-DST: Leveraging Unlabelled Data in Zero-Shot Dialogue State Tracking

1 code implementation • 16 Oct 2023 • Chuang Li, Yan Zhang, Min-Yen Kan, Haizhou Li

Previous zero-shot dialogue state tracking (DST) methods only apply transfer learning, ignoring unlabelled data in the target domain.

Dialogue State Tracking Transfer Learning

Paper
Code

xDial-Eval: A Multilingual Open-Domain Dialogue Evaluation Benchmark

1 code implementation • 13 Oct 2023 • Chen Zhang, Luis Fernando D'Haro, Chengguang Tang, Ke Shi, Guohua Tang, Haizhou Li

The English dialogue data are extended to nine other languages with commercial machine translation systems.

Dialogue Evaluation Machine Translation

Paper
Code

Leveraging In-the-Wild Data for Effective Self-Supervised Pretraining in Speaker Recognition

1 code implementation • 21 Sep 2023 • Shuai Wang, Qibing Bai, Qi Liu, Jianwei Yu, Zhengyang Chen, Bing Han, Yanmin Qian, Haizhou Li

Current speaker recognition systems primarily rely on supervised approaches, constrained by the scale of labeled datasets.

Speaker Recognition

566

Paper
Code

AceGPT, Localizing Large Language Models in Arabic

1 code implementation • 21 Sep 2023 • Huang Huang, Fei Yu, Jianqing Zhu, Xuening Sun, Hao Cheng, Dingjie Song, Zhihong Chen, Abdulmohsen Alharthi, Bang An, Juncai He, Ziche Liu, Zhiyi Zhang, Junying Chen, Jianquan Li, Benyou Wang, Lian Zhang, Ruoyu Sun, Xiang Wan, Haizhou Li, Jinchao Xu

This paper is devoted to the development of a localized Large Language Model (LLM) specifically for Arabic, a language imbued with unique cultural characteristics inadequately addressed by current mainstream models.

Instruction Following Language Modelling +2

Paper
Code

Emotion-Aware Prosodic Phrasing for Expressive Text-to-Speech

1 code implementation • 21 Sep 2023 • Rui Liu, Bin Liu, Haizhou Li

Prosodic phrasing is crucial to the naturalness and intelligibility of end-to-end Text-to-Speech (TTS).

Paper
Code

FluentEditor: Text-based Speech Editing by Considering Acoustic and Prosody Consistency

1 code implementation • 21 Sep 2023 • Rui Liu, Jiatian Xi, Ziyue Jiang, Haizhou Li

Text-based speech editing (TSE) techniques are designed to enable users to edit the output audio by modifying the input text transcript instead of the audio itself.

Paper
Code

Spiking-LEAF: A Learnable Auditory front-end for Spiking Neural Networks

no code implementations • 18 Sep 2023 • Zeyang Song, Jibin Wu, Malu Zhang, Mike Zheng Shou, Haizhou Li

Brain-inspired spiking neural networks (SNNs) have demonstrated great potential for temporal signal processing.

Keyword Spotting Speaker Identification

Paper
Add Code

A Conversation is Worth A Thousand Recommendations: A Survey of Holistic Conversational Recommender Systems

1 code implementation • 14 Sep 2023 • Chuang Li, Hengchang Hu, Yan Zhang, Min-Yen Kan, Haizhou Li

However, not all CRS approaches use human conversations as their source of interaction data; the majority of prior CRS work simulates interactions by exchanging entity-level information.

Language Modelling Recommendation Systems

Paper
Code

EEG-Derived Voice Signature for Attended Speaker Detection

no code implementations • 28 Aug 2023 • Hongxu Zhu, Siqi Cai, Yidi Jiang, Qiquan Zhang, Haizhou Li

\textit{Conclusion:} We conclude that it is possible to derive the attended speaker's voice signature from the EEG signals so as to detect the attended speaker in a listening brain.

EEG

Paper
Add Code

TC-LIF: A Two-Compartment Spiking Neuron Model for Long-Term Sequential Modelling

1 code implementation • 25 Aug 2023 • Shimin Zhang, Qu Yang, Chenxiang Ma, Jibin Wu, Haizhou Li, Kay Chen Tan

The identification of sensory cues associated with potential opportunities and dangers is frequently complicated by unrelated events that separate useful cues by long delays.

Paper
Code

CMB: A Comprehensive Medical Benchmark in Chinese

1 code implementation • 17 Aug 2023 • Xidong Wang, Guiming Hardy Chen, Dingjie Song, Zhiyi Zhang, Zhihong Chen, Qingying Xiao, Feng Jiang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li

We hope this benchmark provide first-hand experience in existing LLMs for medicine and also facilitate the widespread adoption and enhancement of medical LLMs within China.

Paper
Code

NeuroHeed: Neuro-Steered Speaker Extraction using EEG Signals

no code implementations • 26 Jul 2023 • Zexu Pan, Marvin Borsdorf, Siqi Cai, Tanja Schultz, Haizhou Li

We propose both an offline and an online NeuroHeed, with the latter designed for real-time inference.

EEG

Paper
Add Code

GrammarGPT: Exploring Open-Source LLMs for Native Chinese Grammatical Error Correction with Supervised Fine-Tuning

1 code implementation • 26 Jul 2023 • Yaxin Fan, Feng Jiang, Peifeng Li, Haizhou Li

Although model parameters are 20x larger than the SOTA baseline, the required amount of data for instruction tuning is 1200x smaller, illustrating the potential of open-source LLMs on native CGEC.

Grammatical Error Correction

141

Paper
Code

Is ChatGPT Involved in Texts? Measure the Polish Ratio to Detect ChatGPT-Generated Text

2 code implementations • 21 Jul 2023 • Lingyi Yang, Feng Jiang, Haizhou Li

Despite this, most previous studies have been predominantly geared towards creating detectors that differentiate between purely ChatGPT-generated texts and human-authored texts.

Misinformation Text Generation

Paper
Code

Self-Supervised Acoustic Word Embedding Learning via Correspondence Transformer Encoder

no code implementations • 19 Jul 2023 • Jingru Lin, Xianghu Yue, Junyi Ao, Haizhou Li

We train the model based on the idea that different realisations of the same word should be close in the underlying embedding space.

Word Embeddings

Paper
Add Code

Long Short-term Memory with Two-Compartment Spiking Neuron

no code implementations • 14 Jul 2023 • Shimin Zhang, Qu Yang, Chenxiang Ma, Jibin Wu, Haizhou Li, Kay Chen Tan

The identification of sensory cues associated with potential opportunities and dangers is frequently complicated by unrelated events that separate useful cues by long delays.

Paper
Add Code

High-Quality Automatic Voice Over with Accurate Alignment: Supervision through Self-Supervised Discrete Speech Units

no code implementations • 29 Jun 2023 • Junchen Lu, Berrak Sisman, Mingyang Zhang, Haizhou Li

The goal of Automatic Voice Over (AVO) is to generate speech in sync with a silent video given its text script.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

A Hybrid Neural Coding Approach for Pattern Recognition with Spiking Neural Networks

1 code implementation • 26 May 2023 • Xinyi Chen, Qu Yang, Jibin Wu, Haizhou Li, Kay Chen Tan

As an initial exploration in this direction, we propose a hybrid neural coding and learning framework, which encompasses a neural coding zoo with diverse neural coding schemes discovered in neuroscience.

Image Classification

Paper
Code

Betray Oneself: A Novel Audio DeepFake Detection Model via Mono-to-Stereo Conversion

1 code implementation • 25 May 2023 • Rui Liu, Jinhua Zhang, Guanglai Gao, Haizhou Li

In this paper, we propose a novel ADD model, termed as M2S-ADD, that attempts to discover audio authenticity cues during the mono-to-stereo conversion process.

DeepFake Detection Face Swapping +1

Paper
Code

Advancing Topic Segmentation and Outline Generation in Chinese Texts: The Paragraph-level Topic Representation, Corpus, and Benchmark

1 code implementation • 24 May 2023 • Feng Jiang, Weihao Liu, Xiaomin Chu, Peifeng Li, Qiaoming Zhu, Haizhou Li

Topic segmentation and outline generation strive to divide a document into coherent topic sections and generate corresponding subheadings, unveiling the discourse topic structure of a document.

Discourse Parsing Information Retrieval +2

Paper
Code

HuatuoGPT, towards Taming Language Model to Be a Doctor

1 code implementation • 24 May 2023 • Hongbo Zhang, Junying Chen, Feng Jiang, Fei Yu, Zhihong Chen, Jianquan Li, Guiming Chen, Xiangbo Wu, Zhiyi Zhang, Qingying Xiao, Xiang Wan, Benyou Wang, Haizhou Li

Experimental results demonstrate that HuatuoGPT achieves state-of-the-art results in performing medical consultation among open-source LLMs in GPT-4 evaluation, human evaluation, and medical benchmark datasets.

Language Modelling Large Language Model

957

Paper
Code

Topic-driven Distant Supervision Framework for Macro-level Discourse Parsing

no code implementations • 23 May 2023 • Feng Jiang, Longwang He, Peifeng Li, Qiaoming Zhu, Haizhou Li

Discourse parsing, the task of analyzing the internal rhetorical structure of texts, is a challenging problem in natural language processing.

Discourse Parsing Transfer Learning

Paper
Add Code

Enhancing Black-Box Few-Shot Text Classification with Prompt-Based Data Augmentation

no code implementations • 23 May 2023 • Danqing Luo, Chen Zhang, Jiahui Xu, Bin Wang, Yiming Chen, Yan Zhang, Haizhou Li

To achieve this, we treat the black-box model as a feature extractor and train a classifier with the augmented text data.

Data Augmentation Few-Shot Text Classification +2

Paper
Add Code

Target Active Speaker Detection with Audio-visual Cues

1 code implementation • 22 May 2023 • Yidi Jiang, Ruijie Tao, Zexu Pan, Haizhou Li

To benefit from both facial cue and reference speech, we propose the Target Speaker TalkNet (TS-TalkNet), which leverages a pre-enrolled speaker embedding to complement the audio-visual synchronization cue in detecting whether the target speaker is speaking.

Audio-Visual Synchronization

Paper
Code

Dynamic Transformers Provide a False Sense of Efficiency

1 code implementation • 20 May 2023 • Yiming Chen, Simin Chen, Zexin Li, Wei Yang, Cong Liu, Robby T. Tan, Haizhou Li

Despite much success in natural language processing (NLP), pre-trained language models typically lead to a high computational cost during inference.

Adversarial Attack

Paper
Code

Uncovering the Potential of ChatGPT for Discourse Analysis in Dialogue: An Empirical Study

1 code implementation • 15 May 2023 • Yaxin Fan, Feng Jiang, Peifeng Li, Haizhou Li

In this paper, we aim to systematically inspect ChatGPT's performance in two discourse analysis tasks: topic segmentation and discourse parsing, focusing on its deep semantic understanding of linear and hierarchical discourse structures underlying dialogue.

Discourse Parsing In-Context Learning +2

Paper
Code

Accented Text-to-Speech Synthesis with Limited Data

no code implementations • 8 May 2023 • Xuehao Zhou, Mingyang Zhang, Yi Zhou, Zhizheng Wu, Haizhou Li

Both objective and subjective evaluation results show that the accented TTS front-end fine-tuned with a small accented phonetic lexicon (5k words) effectively handles the phonetic variation of accents, while the accented TTS acoustic model fine-tuned with a limited amount of accented speech data (approximately 3 minutes) effectively improves the prosodic rendering including pitch and duration.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

Phoenix: Democratizing ChatGPT across Languages

1 code implementation • 20 Apr 2023 • Zhihong Chen, Feng Jiang, Junying Chen, Tiannan Wang, Fei Yu, Guiming Chen, Hongbo Zhang, Juhao Liang, Chen Zhang, Zhiyi Zhang, Jianquan Li, Xiang Wan, Benyou Wang, Haizhou Li

This paper presents our efforts to democratize ChatGPT across language.

Language Modelling Large Language Model

2,881

Paper
Code

Seeing What You Said: Talking Face Generation Guided by a Lip Reading Expert

1 code implementation • CVPR 2023 • Jiadong Wang, Xinyuan Qian, Malu Zhang, Robby T. Tan, Haizhou Li

To address the problem, we propose using a lip-reading expert to improve the intelligibility of the generated lip regions by penalizing the incorrect generation results.

Contrastive Learning Lip Reading +1

367

Paper
Code

TTS-Guided Training for Accent Conversion Without Parallel Data

no code implementations • 20 Dec 2022 • Yi Zhou, Zhizheng Wu, Mingyang Zhang, Xiaohai Tian, Haizhou Li

Specifically, a text-to-speech (TTS) system is first pretrained with target-accented speech data.

Decoder

Paper
Add Code

PoE: a Panel of Experts for Generalized Automatic Dialogue Assessment

no code implementations • 18 Dec 2022 • Chen Zhang, Luis Fernando D'Haro, Qiquan Zhang, Thomas Friedrichs, Haizhou Li

To tackle the multi-domain dialogue evaluation task, we propose a Panel of Experts (PoE), a multitask network that consists of a shared transformer encoder and a collection of lightweight adapters.

Data Augmentation Dialogue Evaluation +4

Paper
Add Code

Relational Sentence Embedding for Flexible Semantic Matching

1 code implementation • 17 Dec 2022 • Bin Wang, Haizhou Li

We present Relational Sentence Embedding (RSE), a new paradigm to further discover the potential of sentence embeddings.

Relation Semantic Textual Similarity +3

Paper
Code

Minimizing the Accumulated Trajectory Error to Improve Dataset Distillation

3 code implementations • CVPR 2023 • Jiawei Du, Yidi Jiang, Vincent Y. F. Tan, Joey Tianyi Zhou, Haizhou Li

To mitigate the adverse impact of this accumulated trajectory error, we propose a novel approach that encourages the optimization algorithm to seek a flat trajectory.

Neural Architecture Search

1,201

Paper
Code

Self-Transriber: Few-shot Lyrics Transcription with Self-training

no code implementations • 18 Nov 2022 • Xiaoxue Gao, Xianghu Yue, Haizhou Li

The current lyrics transcription approaches heavily rely on supervised learning with labeled data, but such data are scarce and manual labeling of singing is expensive.

Few-Shot Learning

Paper
Add Code

I4U System Description for NIST SRE'20 CTS Challenge

no code implementations • 2 Nov 2022 • Kong Aik Lee, Tomi Kinnunen, Daniele Colibro, Claudio Vair, Andreas Nautsch, Hanwu Sun, Liang He, Tianyu Liang, Qiongqiong Wang, Mickael Rouvier, Pierre-Michel Bousquet, Rohan Kumar Das, Ignacio Viñals Bailo, Meng Liu, Héctor Deldago, Xuechen Liu, Md Sahidullah, Sandro Cumani, Boning Zhang, Koji Okabe, Hitoshi Yamamoto, Ruijie Tao, Haizhou Li, Alfonso Ortega Giménez, Longbiao Wang, Luis Buera

This manuscript describes the I4U submission to the 2020 NIST Speaker Recognition Evaluation (SRE'20) Conversational Telephone Speech (CTS) Challenge.

Speaker Recognition

Paper
Add Code

ImagineNET: Target Speaker Extraction with Intermittent Visual Cue through Embedding Inpainting

1 code implementation • 31 Oct 2022 • Zexu Pan, Wupeng Wang, Marvin Borsdorf, Haizhou Li

In this paper, we study the audio-visual speaker extraction algorithms with intermittent visual cue.

Target Speaker Extraction

Paper
Code

Generate, Discriminate and Contrast: A Semi-Supervised Sentence Representation Learning Framework

1 code implementation • 30 Oct 2022 • Yiming Chen, Yan Zhang, Bin Wang, Zuozhu Liu, Haizhou Li

Most sentence embedding techniques heavily rely on expensive human-annotated sentence pairs as the supervised signals.

Domain Adaptation Sentence +3

Paper
Code

token2vec: A Joint Self-Supervised Pre-training Framework Using Unpaired Speech and Text

no code implementations • 30 Oct 2022 • Xianghu Yue, Junyi Ao, Xiaoxue Gao, Haizhou Li

Firstly, due to the distinct characteristics between speech and text modalities, where speech is continuous while text is discrete, we first discretize speech into a sequence of discrete speech tokens to solve the modality mismatch problem.

intent-classification Intent Classification +1

Paper
Add Code

Speaker recognition with two-step multi-modal deep cleansing

1 code implementation • 28 Oct 2022 • Ruijie Tao, Kong Aik Lee, Zhan Shi, Haizhou Li

However, noisy samples (i. e., with wrong labels) in the training set induce confusion and cause the network to learn the incorrect representation.

Representation Learning Speaker Recognition +1

Paper
Code

Self-Supervised Training of Speaker Encoder with Multi-Modal Diverse Positive Pairs

no code implementations • 27 Oct 2022 • Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li

We study a novel neural architecture and its training strategies of speaker encoder for speaker recognition without using any identity labels.

Contrastive Learning Self-Supervised Learning +1

Paper
Add Code

FCTalker: Fine and Coarse Grained Context Modeling for Expressive Conversational Speech Synthesis

1 code implementation • 27 Oct 2022 • Yifan Hu, Rui Liu, Guanglai Gao, Haizhou Li

Therefore, we propose a novel expressive conversational TTS model, termed as FCTalker, that learn the fine and coarse grained context dependency at the same time during speech generation.

Speech Synthesis

Paper
Code

Explicit Intensity Control for Accented Text-to-speech

no code implementations • 27 Oct 2022 • Rui Liu, Haolin Zuo, De Hu, Guanglai Gao, Haizhou Li

Accented text-to-speech (TTS) synthesis seeks to generate speech with an accent (L2) as a variant of the standard version (L1).

speech-recognition Speech Recognition

Paper
Add Code

Exploiting modality-invariant feature for robust multimodal emotion recognition with missing modalities

1 code implementation • 27 Oct 2022 • Haolin Zuo, Rui Liu, Jinming Zhao, Guanglai Gao, Haizhou Li

Multimodal emotion recognition leverages complementary information across modalities to gain performance.

Multimodal Emotion Recognition

Paper
Code

FineD-Eval: Fine-grained Automatic Dialogue-Level Evaluation

2 code implementations • 25 Oct 2022 • Chen Zhang, Luis Fernando D'Haro, Qiquan Zhang, Thomas Friedrichs, Haizhou Li

Recent model-based reference-free metrics for open-domain dialogue evaluation exhibit promising correlations with human judgment.

Dialogue Evaluation

Paper
Code

Mixed-EVC: Mixed Emotion Synthesis and Control in Voice Conversion

no code implementations • 25 Oct 2022 • Kun Zhou, Berrak Sisman, Carlos Busso, Bin Ma, Haizhou Li

To achieve this, we propose a novel EVC framework, Mixed-EVC, which only leverages discrete emotion training labels.

Attribute Voice Conversion

Paper
Add Code

Analyzing and Evaluating Faithfulness in Dialogue Summarization

1 code implementation • 21 Oct 2022 • Bin Wang, Chen Zhang, Yan Zhang, Yiming Chen, Haizhou Li

The factual correctness of summaries has the highest priority before practical applications.

Text Summarization

Paper
Code

Training Spiking Neural Networks with Local Tandem Learning

1 code implementation • 10 Oct 2022 • Qu Yang, Jibin Wu, Malu Zhang, Yansong Chua, Xinchao Wang, Haizhou Li

The LTL rule follows the teacher-student learning approach by mimicking the intermediate feature representations of a pre-trained ANN.

Paper
Code

A Focused Study on Sequence Length for Dialogue Summarization

1 code implementation • 24 Sep 2022 • Bin Wang, Chen Zhang, Chengwei Wei, Haizhou Li

Output length is critical to dialogue summarization systems.

dialogue summary

Paper
Code

The Kriston AI System for the VoxCeleb Speaker Recognition Challenge 2022

no code implementations • 23 Sep 2022 • Qutang Cai, Guoqiang Hong, Zhijian Ye, Ximin Li, Haizhou Li

This technical report describes our system for track 1, 2 and 4 of the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22).

Action Detection Activity Detection +2

Paper
Add Code

Controllable Accented Text-to-Speech Synthesis

no code implementations • 22 Sep 2022 • Rui Liu, Berrak Sisman, Guanglai Gao, Haizhou Li

Accented TTS synthesis is challenging as L2 is different from L1 in both in terms of phonetic rendering and prosody pattern.

Speech Synthesis Text-To-Speech Synthesis

Paper
Add Code

The 2022 Far-field Speaker Verification Challenge: Exploring domain mismatch and semi-supervised learning under the far-field scenario

1 code implementation • 12 Sep 2022 • Xiaoyi Qin, Ming Li, Hui Bu, Shrikanth Narayanan, Haizhou Li

In addition, a supplementary set for the FFSVC2020 dataset is released this year.

Speaker Verification

Paper
Code

Speech Synthesis with Mixed Emotions

no code implementations • 11 Aug 2022 • Kun Zhou, Berrak Sisman, Rajib Rana, B. W. Schuller, Haizhou Li

We then incorporate our formulation into a sequence-to-sequence emotional text-to-speech framework.

Attribute Emotional Speech Synthesis

Paper
Add Code

PoLyScriber: Integrated Fine-tuning of Extractor and Lyrics Transcriber for Polyphonic Music

no code implementations • 15 Jul 2022 • Xiaoxue Gao, Chitralekha Gupta, Haizhou Li

Lyrics transcription of polyphonic music is challenging as the background music affects lyrics intelligibility.

Paper
Add Code

Accurate Emotion Strength Assessment for Seen and Unseen Speech Based on Data-Driven Deep Learning

1 code implementation • 15 Jun 2022 • Rui Liu, Berrak Sisman, Björn Schuller, Guanglai Gao, Haizhou Li

In this paper, we propose a data-driven deep learning model, i. e. StrengthNet, to improve the generalization of emotion strength assessment for seen and unseen speech.

Attribute Emotion Classification +2

Paper
Code

M3ED: Multi-modal Multi-scene Multi-label Emotional Dialogue Database

1 code implementation • ACL 2022 • Jinming Zhao, Tenggan Zhang, Jingwen Hu, Yuchen Liu, Qin Jin, Xinchao Wang, Haizhou Li

In this work, we propose a Multi-modal Multi-scene Multi-label Emotional Dialogue dataset, M3ED, which contains 990 dyadic emotional dialogues from 56 different TV series, a total of 9, 082 turns and 24, 449 utterances.

Cultural Vocal Bursts Intensity Prediction Emotion Recognition

Paper
Code

Music-robust Automatic Lyrics Transcription of Polyphonic Music

1 code implementation • 7 Apr 2022 • Xiaoxue Gao, Chitralekha Gupta, Haizhou Li

To improve the robustness of lyrics transcription to the background music, we propose a strategy of combining the features that emphasize the singing vocals, i. e. music-removed features that represent singing vocal extracted features, and the features that capture the singing vocals as well as the background music, i. e. music-present features.

Automatic Lyrics Transcription Language Modelling

Paper
Code

Genre-conditioned Acoustic Models for Automatic Lyrics Transcription of Polyphonic Music

no code implementations • 7 Apr 2022 • Xiaoxue Gao, Chitralekha Gupta, Haizhou Li

Lyrics transcription of polyphonic music is challenging not only because the singing vocals are corrupted by the background music, but also because the background music and the singing style vary across music genres, such as pop, metal, and hip hop, which affects lyrics intelligibility of the song in different ways.

Automatic Lyrics Transcription

Paper
Add Code

A Hybrid Continuity Loss to Reduce Over-Suppression for Time-domain Target Speaker Extraction

1 code implementation • 31 Mar 2022 • Zexu Pan, Meng Ge, Haizhou Li

We propose a hybrid continuity loss function for time-domain speaker extraction algorithms to settle the over-suppression problem.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Code

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

1 code implementation • 31 Mar 2022 • Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, LiRong Dai, Jinyu Li, Yao Qian, Furu Wei

In this way, the decoder learns to reconstruct original speech information with codes before learning to generate correct text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

1,047

Paper
Code

Speaker Extraction with Co-Speech Gestures Cue

1 code implementation • 31 Mar 2022 • Zexu Pan, Xinyuan Qian, Haizhou Li

Speaker extraction seeks to extract the clean speech of a target speaker from a multi-talker mixture speech.

Speech Separation

Paper
Code

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

1 code implementation • 29 Mar 2022 • Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li

LightHuBERT outperforms the original HuBERT on ASR and five SUPERB tasks with the HuBERT size, achieves comparable performance to the teacher model in most tasks with a reduction of 29% parameters, and obtains a $3. 5\times$ compression ratio in three SUPERB tasks, e. g., automatic speaker verification, keyword spotting, and intent classification, with a slight accuracy loss.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Code

Just Rank: Rethinking Evaluation with Word and Sentence Similarities

1 code implementation • ACL 2022 • Bin Wang, C. -C. Jay Kuo, Haizhou Li

Word and sentence similarity tasks have become the de facto evaluation method.

Benchmarking Semantic Similarity +5

Paper
Code

L-SpEx: Localized Target Speaker Extraction

1 code implementation • 21 Feb 2022 • Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Speaker extraction aims to extract the target speaker's voice from a multi-talker speech mixture given an auxiliary reference utterance.

Target Speaker Extraction

Paper
Code

ADD 2022: the First Audio Deep Synthesis Detection Challenge

no code implementations • 17 Feb 2022 • Jiangyan Yi, Ruibo Fu, JianHua Tao, Shuai Nie, Haoxin Ma, Chenglong Wang, Tao Wang, Zhengkun Tian, Ye Bai, Cunhang Fan, Shan Liang, Shiming Wang, Shuai Zhang, Xinrui Yan, Le Xu, Zhengqi Wen, Haizhou Li, Zheng Lian, Bin Liu

Audio deepfake detection is an emerging topic, which was included in the ASVspoof 2021.

Audio Generation DeepFake Detection +1

Paper
Add Code

MFA: TDNN with Multi-scale Frequency-channel Attention for Text-independent Speaker Verification with Short Utterances

no code implementations • 3 Feb 2022 • Tianchi Liu, Rohan Kumar Das, Kong Aik Lee, Haizhou Li

The time delay neural network (TDNN) represents one of the state-of-the-art of neural solutions to text-independent speaker verification.

Text-Independent Speaker Verification

Paper
Add Code

Emotion Intensity and its Control for Emotional Voice Conversion

no code implementations • 10 Jan 2022 • Kun Zhou, Berrak Sisman, Rajib Rana, Björn W. Schuller, Haizhou Li

As desired, the proposed network controls the fine-grained emotion intensity in the output speech.

Emotion Classification Voice Conversion

Paper
Add Code

MDD-Eval: Self-Training on Augmented Data for Multi-Domain Dialogue Evaluation

1 code implementation • 14 Dec 2021 • Chen Zhang, Luis Fernando D'Haro, Thomas Friedrichs, Haizhou Li

Chatbots are designed to carry out human-like conversations across different domains, such as general chit-chat, knowledge exchange, and persona-grounded conversations.

Ranked #1 on Dialogue Evaluation on USR-TopicalChat

Dialogue Evaluation

Paper
Code

HLT-NUS SUBMISSION FOR 2020 NIST Conversational Telephone Speech SRE

3 code implementations • 12 Nov 2021 • Rohan Kumar Das, Ruijie Tao, Haizhou Li

This work provides a brief description of Human Language Technology (HLT) Laboratory, National University of Singapore (NUS) system submission for 2020 NIST conversational telephone speech (CTS) speaker recognition evaluation (SRE).

Domain Adaptation Speaker Recognition

537

Paper
Code

MEmoBERT: Pre-training Model with Prompt-based Learning for Multimodal Emotion Recognition

no code implementations • 27 Oct 2021 • Jinming Zhao, Ruichen Li, Qin Jin, Xinchao Wang, Haizhou Li

Multimodal emotion recognition study is hindered by the lack of labelled corpora in terms of scale and diversity, due to the high annotation cost and label ambiguity.

Emotion Classification Multimodal Emotion Recognition +1

Paper
Add Code

Disentanglement of Emotional Style and Speaker Identity for Expressive Voice Conversion

no code implementations • 20 Oct 2021 • Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li

Expressive voice conversion performs identity conversion for emotional speakers by jointly converting speaker identity and emotional style.

Disentanglement Voice Conversion

Paper
Add Code

DeepA: A Deep Neural Analyzer For Speech And Singing Vocoding

no code implementations • 13 Oct 2021 • Sergey Nikonorov, Berrak Sisman, Mingyang Zhang, Haizhou Li

At the same time, as the deep neural analyzer is learnable, it is expected to be more accurate for signal reconstruction and manipulation, and generalizable from speech to singing.

Speech Synthesis Voice Conversion

Paper
Add Code

Ego4D: Around the World in 3,000 Hours of Egocentric Video

7 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

5,203

Paper
Code

Self-supervised Speaker Recognition with Loss-gated Learning

1 code implementation • 8 Oct 2021 • Ruijie Tao, Kong Aik Lee, Rohan Kumar Das, Ville Hautamäki, Haizhou Li

In self-supervised learning for speaker recognition, pseudo labels are useful as the supervision signals.

Self-Supervised Learning Speaker Recognition

Paper
Code

VisualTTS: TTS with Accurate Lip-Speech Synchronization for Automatic Voice Over

no code implementations • 7 Oct 2021 • Junchen Lu, Berrak Sisman, Rui Liu, Mingyang Zhang, Haizhou Li

The proposed VisualTTS adopts two novel mechanisms that are 1) textual-visual attention, and 2) visual fusion strategy during acoustic decoding, which both contribute to forming accurate alignment between the input text content and lip motion in input lip sequence.

Speech Synthesis

Paper
Add Code

StrengthNet: Deep Learning-based Emotion Strength Assessment for Emotional Speech Synthesis

1 code implementation • 7 Oct 2021 • Rui Liu, Berrak Sisman, Haizhou Li

The emotion strength of synthesized speech can be controlled flexibly using a strength descriptor, which is obtained by an emotion attribute ranking function.

Attribute Data Augmentation +2

Paper
Code

Investigating the Impact of Pre-trained Language Models on Dialog Evaluation

no code implementations • 5 Oct 2021 • Chen Zhang, Luis Fernando D'Haro, Yiming Chen, Thomas Friedrichs, Haizhou Li

Yet, the impact of different Pr-LMs on the performance of automatic metrics is not well-understood.

Dialogue Evaluation

Paper
Add Code

Revisiting Self-Training for Few-Shot Learning of Language Model

1 code implementation • EMNLP 2021 • Yiming Chen, Yan Zhang, Chen Zhang, Grandee Lee, Ran Cheng, Haizhou Li

In this work, we revisit the self-training technique for language model fine-tuning and present a state-of-the-art prompt-based few-shot learner, SFLM.

Benchmarking Few-Shot Learning +6

Paper
Code

PL-EESR: Perceptual Loss Based END-TO-END Robust Speaker Representation Extraction

1 code implementation • 3 Oct 2021 • Yi Ma, Kong Aik Lee, Ville Hautamaki, Haizhou Li

Speech enhancement aims to improve the perceptual quality of the speech signal by suppression of the background noise.

Speaker Identification Speaker Verification +1

Paper
Code

USEV: Universal Speaker Extraction with Visual Cue

1 code implementation • 30 Sep 2021 • Zexu Pan, Meng Ge, Haizhou Li

The speaker extraction algorithm requires an auxiliary reference, such as a video recording or a pre-recorded speech, to form top-down auditory attention on the target speaker.

Paper
Code

Exploring Teacher-Student Learning Approach for Multi-lingual Speech-to-Intent Classification

no code implementations • 28 Sep 2021 • Bidisha Sharma, Maulik Madhavi, Xuehao Zhou, Haizhou Li

In particular, we use synthesized speech generated from an English-Mandarin text corpus for analysis and training of a multi-lingual intent classification model.

Classification intent-classification +1

Paper
Add Code

Knowledge Distillation from BERT Transformer to Speech Transformer for Intent Classification

1 code implementation • 5 Aug 2021 • Yidi Jiang, Bidisha Sharma, Maulik Madhavi, Haizhou Li

In this regard, we leverage the reliable and widely used bidirectional encoder representations from transformers (BERT) model as a language model and transfer the knowledge to build an acoustic model for intent classification using the speech.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

Paper
Code

Bootstrapped Unsupervised Sentence Representation Learning

1 code implementation • ACL 2021 • Yan Zhang, Ruidan He, Zuozhu Liu, Lidong Bing, Haizhou Li

As high-quality labeled data is scarce, unsupervised sentence representation learning has attracted much attention.

Representation Learning Semantic Textual Similarity +2

Paper
Code

Is Someone Speaking? Exploring Long-term Temporal Features for Audio-visual Active Speaker Detection

4 code implementations • 14 Jul 2021 • Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.

Audio-Visual Active Speaker Detection

262

Paper
Code

Serialized Multi-Layer Multi-Head Attention for Neural Speaker Embedding

no code implementations • 14 Jul 2021 • Hongning Zhu, Kong Aik Lee, Haizhou Li

Instead of utilizing multi-head attention in parallel, the proposed serialized multi-layer multi-head attention is designed to aggregate and propagate attentive statistics from one layer to the next in a serialized manner.

Text-Independent Speaker Verification

Paper
Add Code

Expressive Voice Conversion: A Joint Framework for Speaker Identity and Emotional Style Transfer

no code implementations • 8 Jul 2021 • Zongyang Du, Berrak Sisman, Kun Zhou, Haizhou Li

Traditional voice conversion(VC) has been focused on speaker identity conversion for speech with a neutral expression.

Speech Emotion Recognition Style Transfer +1

Paper
Add Code

Selective Listening by Synchronizing Speech with Lips

1 code implementation • 14 Jun 2021 • Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li

A speaker extraction algorithm seeks to extract the speech of a target speaker from a multi-talker speech mixture when given a cue that represents the target speaker, such as a pre-enrolled speech utterance, or an accompanying video track.

Lip Reading Target Speaker Extraction

Paper
Code

DynaEval: Unifying Turn and Dialogue Level Evaluation

1 code implementation • ACL 2021 • Chen Zhang, Yiming Chen, Luis Fernando D'Haro, Yan Zhang, Thomas Friedrichs, Grandee Lee, Haizhou Li

Effective evaluation metrics should reflect the dynamics of such interaction.

Dialogue Evaluation

Paper
Code

NUS-HLT Report for ActivityNet Challenge 2021 AVA (Speaker)

1 code implementation • The ActivityNet Large-Scale Activity Recognition Challenge Workshop, CVPR 2021 • Ruijie Tao, Zexu Pan, Rohan Kumar Das, Xinyuan Qian, Mike Zheng Shou, Haizhou Li

Active speaker detection (ASD) seeks to detect who is speaking in a visual scene of one or more speakers.

Ranked #9 on Audio-Visual Active Speaker Detection on AVA-ActiveSpeaker

Audio-Visual Active Speaker Detection

262

Paper
Code

Emotional Voice Conversion: Theory, Databases and ESD

1 code implementation • 31 May 2021 • Kun Zhou, Berrak Sisman, Rui Liu, Haizhou Li

In this paper, we first provide a review of the state-of-the-art emotional voice conversion research, and the existing emotional speech databases.

Voice Conversion

299

Paper
Code

The Multi-speaker Multi-style Voice Cloning Challenge 2021

no code implementations • 5 Apr 2021 • Qicong Xie, Xiaohai Tian, Guanghou Liu, Kun Song, Lei Xie, Zhiyong Wu, Hai Li, Song Shi, Haizhou Li, Fen Hong, Hui Bu, Xin Xu

The challenge consists of two tracks, namely few-shot track and one-shot track, where the participants are required to clone multiple target voices with 100 and 5 samples respectively.

Benchmarking Voice Cloning

Paper
Add Code

Reinforcement Learning for Emotional Text-to-Speech Synthesis with Improved Emotion Discriminability

no code implementations • 3 Apr 2021 • Rui Liu, Berrak Sisman, Haizhou Li

To our best knowledge, this is the first study of reinforcement learning in emotional text-to-speech synthesis.

reinforcement-learning Reinforcement Learning (RL) +3

Paper
Add Code

Limited Data Emotional Voice Conversion Leveraging Text-to-Speech: Two-stage Sequence-to-Sequence Training

2 code implementations • 31 Mar 2021 • Kun Zhou, Berrak Sisman, Haizhou Li

In stage 2, we perform emotion training with a limited amount of emotional speech data, to learn how to disentangle emotional style and linguistic information from the speech.

Voice Conversion

Paper
Code

Target Speaker Verification with Selective Auditory Attention for Single and Multi-talker Speech

1 code implementation • 30 Mar 2021 • Chenglin Xu, Wei Rao, Jibin Wu, Haizhou Li

Inspired by the study on target speaker extraction, e. g., SpEx, we propose a unified speaker verification framework for both single- and multi-talker speech, that is able to pay selective auditory attention to the target speaker.

Multi-Task Learning Speaker Verification +1

145

Paper
Code

Leveraging Acoustic and Linguistic Embeddings from Pretrained speech and language Models for Intent Classification

no code implementations • 15 Feb 2021 • Bidisha Sharma, Maulik Madhavi, Haizhou Li

An intent classification system is usually implemented as a pipeline process, with a speech recognition module followed by text processing that classifies the intents.

Classification General Classification +7

Paper
Add Code

Multi-stage Speaker Extraction with Utterance and Frame-Level Reference Signals

no code implementations • 19 Nov 2020 • Meng Ge, Chenglin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

Speaker extraction requires a sample speech from the target speaker as the reference.

Paper
Add Code

VAW-GAN for Disentanglement and Recomposition of Emotional Elements in Speech

no code implementations • 3 Nov 2020 • Kun Zhou, Berrak Sisman, Haizhou Li

Emotional voice conversion (EVC) aims to convert the emotion of speech from one state to another while preserving the linguistic content and speaker identity.

Decoder Disentanglement +2

Paper
Add Code

Seen and Unseen emotional style transfer for voice conversion with a new emotional speech dataset

2 code implementations • 28 Oct 2020 • Kun Zhou, Berrak Sisman, Rui Liu, Haizhou Li

Emotional voice conversion aims to transform emotional prosody in speech while preserving the linguistic content and speaker identity.

Decoder Generative Adversarial Network +3

299

Paper
Code

Deep Convolutional Spiking Neural Networks for Keyword Spotting

no code implementations • Interspeech 2020 • Emre Yilmaz, Özgür Bora Gevrek, Jibin Wu, Yuxiang Chen, Xuanbo Meng, Haizhou Li

To explore the effectiveness and computational complexity of SNN on KWS and wakeword detection, we compare the performance and computational costs of spiking fully-connected and convolutional neural networks with ANN counterparts under clean and noisy testing conditions.

Keyword Spotting

Paper
Add Code

GraphSpeech: Syntax-Aware Graph Attention Network For Neural Speech Synthesis

no code implementations • 23 Oct 2020 • Rui Liu, Berrak Sisman, Haizhou Li

Attention-based end-to-end text-to-speech synthesis (TTS) is superior to conventional statistical methods in many ways.

Graph Attention Sentence +2

Paper
Add Code

Muse: Multi-modal target speaker extraction with visual cues

1 code implementation • 15 Oct 2020 • Zexu Pan, Ruijie Tao, Chenglin Xu, Haizhou Li

Speaker extraction algorithm relies on the speech sample from the target speaker as the reference point to focus its attention.

Target Speaker Extraction

Paper
Code

Speaker-Utterance Dual Attention for Speaker and Utterance Verification

no code implementations • 20 Aug 2020 • Tianchi Liu, Rohan Kumar Das, Maulik Madhavi, ShengMei Shen, Haizhou Li

The proposed SUDA features an attention mask mechanism to learn the interaction between the speaker and utterance information streams.

Speaker Verification

Paper
Add Code

Modeling Prosodic Phrasing with Multi-Task Learning in Tacotron-based TTS

no code implementations • 11 Aug 2020 • Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li

We propose a multi-task learning scheme for Tacotron training, that optimizes the system to predict both Mel spectrum and phrase breaks.

Multi-Task Learning Speech Synthesis

Paper
Add Code

Spectrum and Prosody Conversion for Cross-lingual Voice Conversion with CycleGAN

no code implementations • 11 Aug 2020 • Zongyang Du, Kun Zhou, Berrak Sisman, Haizhou Li

It relies on non-parallel training data from two different languages, hence, is more challenging than mono-lingual voice conversion.

Voice Conversion

Paper
Add Code

VAW-GAN for Singing Voice Conversion with Non-parallel Training Data

no code implementations • 10 Aug 2020 • Junchen Lu, Kun Zhou, Berrak Sisman, Haizhou Li

We train an encoder to disentangle singer identity and singing prosody (F0 contour) from phonetic content.

Decoder Generative Adversarial Network +1

Paper
Add Code

Multi-Tones' Phase Coding (MTPC) of Interaural Time Difference by Spiking Neural Network

no code implementations • 7 Jul 2020 • Zihan Pan, Malu Zhang, Jibin Wu, Haizhou Li

Inspired by the mammal's auditory localization pathway, in this paper we propose a pure spiking neural network (SNN) based computational model for precise sound localization in the noisy real-world environment, and implement this algorithm in a real-time robotic system with a microphone array.

Paper
Add Code

Progressive Tandem Learning for Pattern Recognition with Deep Spiking Neural Networks

no code implementations • 2 Jul 2020 • Jibin Wu, Cheng-Lin Xu, Daquan Zhou, Haizhou Li, Kay Chen Tan

In this paper, we propose a novel ANN-to-SNN conversion and layer-wise learning framework for rapid and efficient pattern recognition, which is referred to as progressive tandem learning of deep SNNs.

Computational Efficiency Image Reconstruction +2

Paper
Add Code

Modeling Code-Switch Languages Using Bilingual Parallel Corpus

no code implementations • ACL 2020 • Gr Lee, ee, Haizhou Li

A bilingual language model is expected to model the sequential dependency for words across languages, which is difficult due to the inherent lack of suitable training data as well as diverse syntactic structure across languages.

Bilingual Lexicon Induction Language Modelling +1

Paper
Add Code

You Only Spike Once: Improving Energy-Efficient Neuromorphic Inference to ANN-Level Accuracy

no code implementations • 3 Jun 2020 • Srivatsa P, Kyle Timothy Ng Chu, Burin Amornpaisannon, Yaswanth Tavva, Venkata Pavan Kumar Miriyala, Jibin Wu, Malu Zhang, Haizhou Li, Trevor E. Carlson

Rate-encoded SNNs could be seen as inefficient as an encoding scheme because it involves the transmission of a large number of spikes.

Paper
Add Code

Converting Anyone's Emotion: Towards Speaker-Independent Emotional Voice Conversion

1 code implementation • 13 May 2020 • Kun Zhou, Berrak Sisman, Mingyang Zhang, Haizhou Li

We consider that there is a common code between speakers for emotional expression in a spoken language, therefore, a speaker-independent mapping between emotional states is possible.

Decoder Voice Conversion

Paper
Code

SpEx+: A Complete Time Domain Speaker Extraction Network

no code implementations • 10 May 2020 • Meng Ge, Cheng-Lin Xu, Longbiao Wang, Eng Siong Chng, Jianwu Dang, Haizhou Li

To eliminate such mismatch, we propose a complete time-domain speaker extraction solution, that is called SpEx+.

Ranked #1 on Speech Extraction on WSJ0-2mix-extr

Speech Extraction Audio and Speech Processing Sound

Paper
Add Code

Time-domain speaker extraction network

no code implementations • 29 Apr 2020 • Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li

The inaccuracy of phase estimation is inherent to the frequency domain processing, that affects the quality of signal reconstruction.

Audio and Speech Processing Sound

Paper
Add Code

SpEx: Multi-Scale Time Domain Speaker Extraction Network

1 code implementation • 17 Apr 2020 • Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li

Inspired by Conv-TasNet, we propose a time-domain speaker extraction network (SpEx) that converts the mixture speech into multi-scale embedding coefficients instead of decomposing the speech signal into magnitude and phase spectra.

Decoder Multi-Task Learning

145

Paper
Code

Rectified Linear Postsynaptic Potential Function for Backpropagation in Deep Spiking Neural Networks

no code implementations • 26 Mar 2020 • Malu Zhang, Jiadong Wang, Burin Amornpaisannon, Zhixuan Zhang, VPK Miriyala, Ammar Belatreche, Hong Qu, Jibin Wu, Yansong Chua, Trevor E. Carlson, Haizhou Li

In STDBP algorithm, the timing of individual spikes is used to convey information (temporal coding), and learning (back-propagation) is performed based on spike timing in an event-driven manner.

Decision Making

Paper
Add Code

WaveTTS: Tacotron-based TTS with Joint Time-Frequency Domain Loss

no code implementations • 2 Feb 2020 • Rui Liu, Berrak Sisman, Feilong Bao, Guanglai Gao, Haizhou Li

To address this problem, we propose a new training scheme for Tacotron-based TTS, referred to as WaveTTS, that has 2 loss functions: 1) time-domain loss, denoted as the waveform loss, that measures the distortion between the natural and generated waveform; and 2) frequency-domain loss, that measures the Mel-scale acoustic feature loss between the natural and generated acoustic features.

Paper
Add Code

Transforming Spectrum and Prosody for Emotional Voice Conversion with Non-Parallel Training Data

1 code implementation • 1 Feb 2020 • Kun Zhou, Berrak Sisman, Haizhou Li

Many studies require parallel speech data between different emotional patterns, which is not practical in real life.

Voice Conversion

121

Paper
Code

Independent language modeling architecture for end-to-end ASR

no code implementations • 25 Nov 2019 • Van Tung Pham, Hai-Hua Xu, Yerbolat Khassanov, Zhiping Zeng, Eng Siong Chng, Chongjia Ni, Bin Ma, Haizhou Li

To address this problem, in this work, we propose a new architecture that separates the decoder subnet from the encoder output.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Deep Spiking Neural Networks for Large Vocabulary Automatic Speech Recognition

1 code implementation • 19 Nov 2019 • Jibin Wu, Emre Yilmaz, Malu Zhang, Haizhou Li, Kay Chen Tan

The brain-inspired spiking neural networks (SNN) closely mimic the biological neural networks and can operate on low-power neuromorphic hardware with spike-based computation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Teacher-Student Training for Robust Tacotron-based TTS

no code implementations • 7 Nov 2019 • Rui Liu, Berrak Sisman, Jingdong Li, Feilong Bao, Guanglai Gao, Haizhou Li

We first train a Tacotron2-based TTS model by always providing natural speech frames to the decoder, that serves as a teacher model.

Decoder Knowledge Distillation

Paper
Add Code

End-to-End Code-Switching ASR for Low-Resourced Language Pairs

no code implementations • 27 Sep 2019 • Xianghu Yue, Grandee Lee, Emre Yilmaz, Fang Deng, Haizhou Li

In this work, we describe an E2E ASR pipeline for the recognition of CS speech in which a low-resourced language is mixed with a high resourced language.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Automatic Lyrics Alignment and Transcription in Polyphonic Music: Does Background Music Help?

no code implementations • 23 Sep 2019 • Chitralekha Gupta, Emre Yilmaz, Haizhou Li

Automatic lyrics alignment and transcription in polyphonic music are challenging tasks because the singing vocals are corrupted by the background music.

Audio and Speech Processing Sound

Paper
Add Code

Neural Population Coding for Effective Temporal Classification

no code implementations • 12 Sep 2019 • Zihan Pan, Jibin Wu, Yansong Chua, Malu Zhang, Haizhou Li

We show that, with population neural codings, the encoded patterns are linearly separable using the Support Vector Machine (SVM).

Classification General Classification

Paper
Add Code

An efficient and perceptually motivated auditory neural encoding and decoding algorithm for spiking neural networks

no code implementations • 3 Sep 2019 • Zihan Pan, Yansong Chua, Jibin Wu, Malu Zhang, Haizhou Li, Eliathamby Ambikairajah

The neural encoding scheme, that we call Biologically plausible Auditory Encoding (BAE), emulates the functions of the perceptual components of the human auditory system, that include the cochlear filter bank, the inner hair cells, auditory masking effects from psychoacoustic models, and the spike neural encoding by the auditory nerve.

Benchmarking speech-recognition +1

Paper
Add Code

A Tandem Learning Rule for Effective Training and Rapid Inference of Deep Spiking Neural Networks

1 code implementation • 2 Jul 2019 • Jibin Wu, Yansong Chua, Malu Zhang, Guoqi Li, Haizhou Li, Kay Chen Tan

Spiking neural networks (SNNs) represent the most prominent biologically inspired computing model for neuromorphic computing (NC) architectures.

Event-based vision

Paper
Code

Acoustic Modeling for Automatic Lyrics-to-Audio Alignment

no code implementations • 25 Jun 2019 • Chitralekha Gupta, Emre Yilmaz, Haizhou Li

In this work, we propose (1) using additional speech and music-informed features and (2) adapting the acoustic models trained on a large amount of solo singing vocals towards polyphonic music using a small amount of in-domain data.

Paper
Add Code

Large-Scale Speaker Diarization of Radio Broadcast Archives

no code implementations • 19 Jun 2019 • Emre Yilmaz, Adem Derinel, Zhou Kun, Henk van den Heuvel, Niko Brummer, Haizhou Li, David A. van Leeuwen

This paper describes our initial efforts to build a large-scale speaker diarization (SD) and identification system on a recently digitized radio broadcast archive from the Netherlands which has more than 6500 audio tapes with 3000 hours of Frisian-Dutch speech recorded between 1950-2016.

speaker-diarization Speaker Diarization +1

Paper
Add Code

Code-Switching Detection Using ASR-Generated Language Posteriors

no code implementations • 19 Jun 2019 • Qinyi Wang, Emre Yilmaz, Adem Derinel, Haizhou Li

Code-switching (CS) detection refers to the automatic detection of language switches in code-mixed utterances.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Multi-Graph Decoding for Code-Switching ASR

no code implementations • 18 Jun 2019 • Emre Yilmaz, Samuel Cohen, Xianghu Yue, David van Leeuwen, Haizhou Li

This archive contains recordings with monolingual Frisian and Dutch speech segments as well as Frisian-Dutch CS speech, hence the recognition performance on monolingual segments is also vital for accurate transcriptions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

VQVAE Unsupervised Unit Discovery and Multi-scale Code2Spec Inverter for Zerospeech Challenge 2019

no code implementations • 27 May 2019 • Andros Tjandra, Berrak Sisman, Mingyang Zhang, Sakriani Sakti, Haizhou Li, Satoshi Nakamura

Our proposed approach significantly improved the intelligibility (in CER), the MOS, and discrimination ABX scores compared to the official ZeroSpeech 2019 baseline or even the topline.

Clustering

Paper
Add Code

I4U Submission to NIST SRE 2018: Leveraging from a Decade of Shared Experiences

no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans

The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).

Domain Adaptation Speaker Recognition

Paper
Add Code

Joint training framework for text-to-speech and voice conversion using multi-source Tacotron and WaveNet

no code implementations • 29 Mar 2019 • Mingyang Zhang, Xin Wang, Fuming Fang, Haizhou Li, Junichi Yamagishi

We propose using an extended model architecture of Tacotron, that is a multi-source sequence-to-sequence model with a dual attention mechanism as the shared model for both the TTS and VC tasks.

Decoder Speech Synthesis +1

Paper
Add Code

Optimization of Speaker Extraction Neural Network with Magnitude and Temporal Spectrum Approximation Loss

1 code implementation • 24 Mar 2019 • Cheng-Lin Xu, Wei Rao, Eng Siong Chng, Haizhou Li

The SpeakerBeam-FE (SBF) method is proposed for speaker extraction.

145

Paper
Code

Deep Spiking Neural Network with Spike Count based Learning Rule

no code implementations • 15 Feb 2019 • Jibin Wu, Yansong Chua, Malu Zhang, Qu Yang, Guoqi Li, Haizhou Li

Deep spiking neural networks (SNNs) support asynchronous event-driven computation, massive parallelism and demonstrate great potential to improve the energy efficiency of its synchronous analog counterpart.

Paper
Add Code

On the End-to-End Solution to Mandarin-English Code-switching Speech Recognition

1 code implementation • 1 Nov 2018 • Zhiping Zeng, Yerbolat Khassanov, Van Tung Pham, Hai-Hua Xu, Eng Siong Chng, Haizhou Li

Code-switching (CS) refers to a linguistic phenomenon where a speaker uses different languages in an utterance or between alternating utterances.

Data Augmentation Language Identification +3

Paper
Code

Generative x-vectors for text-independent speaker verification

no code implementations • 17 Sep 2018 • Longting Xu, Rohan Kumar Das, Emre Yilmaz, Jichen Yang, Haizhou Li

Speaker verification (SV) systems using deep neural network embeddings, so-called the x-vector systems, are becoming popular due to its good performance superior to the i-vector systems.

Text-Independent Speaker Verification

Paper
Add Code

Is Neuromorphic MNIST neuromorphic? Analyzing the discriminative power of neuromorphic datasets in the time domain

no code implementations • 3 Jul 2018 • Laxmi R. Iyer, Yansong Chua, Haizhou Li

We also use this SNN for further experiments on N-MNIST to show that rate based SNNs perform better, and precise spike timings are not important in N-MNIST.

Paper
Add Code

Named-Entity Tagging and Domain adaptation for Better Customized Translation

no code implementations • WS 2018 • Zhongwei Li, Xuancong Wang, Ai Ti Aw, Eng Siong Chng, Haizhou Li

Customized translation need pay spe-cial attention to the target domain ter-minology especially the named-entities for the domain.

Domain Adaptation Machine Translation +6

Paper
Add Code

Report of NEWS 2018 Named Entity Transliteration Shared Task

no code implementations • WS 2018 • Nancy Chen, Rafael E. Banchs, Min Zhang, Xiangyu Duan, Haizhou Li

This report presents the results from the Named Entity Transliteration Shared Task conducted as part of The Seventh Named Entities Workshop (NEWS 2018) held at ACL 2018 in Melbourne, Australia.

Information Retrieval Transliteration

Paper
Add Code

NEWS 2018 Whitepaper

no code implementations • WS 2018 • Nancy Chen, Xiangyu Duan, Min Zhang, Rafael E. Banchs, Haizhou Li

Transliteration is defined as phonetic translation of names across languages.

Benchmarking Translation +1

Paper
Add Code

Learning Acoustic Word Embeddings with Temporal Context for Query-by-Example Speech Search

no code implementations • 10 Jun 2018 • Yougen Yuan, Cheung-Chi Leung, Lei Xie, Hongjie Chen, Bin Ma, Haizhou Li

We also find that it is important to have sufficient speech segment pairs to train the deep CNN for effective acoustic word embeddings.

Dynamic Time Warping Word Embeddings

Paper
Add Code

A Multi-State Diagnosis and Prognosis Framework with Feature Learning for Tool Condition Monitoring

no code implementations • 30 Apr 2018 • Chong Zhang, Geok Soon Hong, Jun-Hong Zhou, Kay Chen Tan, Haizhou Li, Huan Xu, Jihoon Hong, Hian-Leng Chan

For fault diagnosis, a cost-sensitive deep belief network (namely ECS-DBN) is applied to deal with the imbalanced data problem for tool state estimation.

Representation Learning

Paper
Add Code

A Cost-Sensitive Deep Belief Network for Imbalanced Classification

no code implementations • 28 Apr 2018 • Chong Zhang, Kay Chen Tan, Haizhou Li, Geok Soon Hong

Adaptive differential evolution optimization is implemented as the optimization algorithm that automatically updates its corresponding parameters without the need of prior domain knowledge.

Classification General Classification +1

Paper
Add Code

Statistical Parametric Speech Synthesis Using Generative Adversarial Networks Under A Multi-task Learning Framework

4 code implementations • 6 Jul 2017 • Shan Yang, Lei Xie, Xiao Chen, Xiaoyan Lou, Xuan Zhu, Dong-Yan Huang, Haizhou Li

In this paper, we aim at improving the performance of synthesized speech in statistical parametric speech synthesis (SPSS) based on a generative adversarial network (GAN).

Sound

513

Paper
Code

Report of NEWS 2016 Machine Transliteration Shared Task

no code implementations • WS 2016 • Xiangyu Duan, Rafael Banchs, Min Zhang, Haizhou Li, A. Kumaran

Information Retrieval Transliteration

Paper
Add Code

Whitepaper of NEWS 2016 Shared Task on Machine Transliteration

no code implementations • WS 2016 • Xiangyu Duan, Min Zhang, Haizhou Li, Rafael Banchs, A. Kumaran

Transliteration

Paper
Add Code

Exploring Convolutional and Recurrent Neural Networks in Sequential Labelling for Dialogue Topic Tracking

no code implementations • ACL 2016 • Seokhwan Kim, Rafael Banchs, Haizhou Li

Paper
Add Code

Evaluating and Combining Name Entity Recognition Systems

no code implementations • WS 2016 • Ridong Jiang, Rafael E. Banchs, Haizhou Li

Information Retrieval Machine Translation +2

Paper
Add Code

Spoofing detection under noisy conditions: a preliminary investigation and an initial database

no code implementations • 9 Feb 2016 • Xiaohai Tian, Zhizheng Wu, Xiong Xiao, Eng Siong Chng, Haizhou Li

To simulate the real-life scenarios, we perform a preliminary investigation of spoofing detection under additive noisy conditions, and also describe an initial database for this task.

Speaker Verification

Paper
Add Code

Fantastic 4 system for NIST 2015 Language Recognition Evaluation

no code implementations • 5 Feb 2016 • Kong Aik Lee, Ville Hautamäki, Anthony Larcher, Wei Rao, Hanwu Sun, Trung Hieu Nguyen, Guangsen Wang, Aleksandr Sizov, Ivan Kukanov, Amir Poorjam, Trung Ngo Trong, Xiong Xiao, Cheng-Lin Xu, Hai-Hua Xu, Bin Ma, Haizhou Li, Sylvain Meignier

This article describes the systems jointly submitted by Institute for Infocomm (I$^2$R), the Laboratoire d'Informatique de l'Universit\'e du Maine (LIUM), Nanyang Technology University (NTU) and the University of Eastern Finland (UEF) for 2015 NIST Language Recognition Evaluation (LRE).

regression