1 code implementation • Findings (EMNLP) 2021 • Boer Lyu, Lu Chen, Kai Yu
Sememes are defined as the atomic units to describe the semantic meaning of concepts.
no code implementations • IWSLT (ACL) 2022 • Qinpei Zhu, Renshou Wu, Guangfeng Liu, Xinyu Zhu, Xingyu Chen, Yang Zhou, Qingliang Miao, Rui Wang, Kai Yu
This paper describes AISP-SJTU’s submissions for the IWSLT 2022 Simultaneous Translation task.
no code implementations • 30 Apr 2024 • Hankun Wang, Chenpeng Du, Yiwei Guo, Shuai Wang, Xie Chen, Kai Yu
We call the attention maps of those heads Alignment-Emerged Attention Maps (AEAMs).
no code implementations • 23 Apr 2024 • Sen Liu, Yiwei Guo, Xie Chen, Kai Yu
While acoustic expressiveness has long been studied in expressive text-to-speech (ETTS), the inherent expressiveness in text lacks sufficient attention, especially for ETTS of artistic works.
no code implementations • 9 Apr 2024 • Yiwei Guo, Chenrun Wang, Yifan Yang, Hankun Wang, Ziyang Ma, Chenpeng Du, Shuai Wang, Hanzheng Li, Shuai Fan, HUI ZHANG, Xie Chen, Kai Yu
Discrete speech tokens have been more and more popular in multiple speech processing fields, including automatic speech recognition (ASR), text-to-speech (TTS) and singing voice synthesis (SVS).
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
2 code implementations • 8 Apr 2024 • Matteo Zecchin, Kai Yu, Osvaldo Simeone
In this work, we demonstrate that ICL can be also used to tackle the problem of multi-user equalization in cell-free MIMO systems with limited fronthaul capacity.
2 code implementations • 6 Apr 2024 • Hongchuan Zeng, Hongshen Xu, Lu Chen, Kai Yu
MBS overcomes the English-centric limitations of existing methods by sampling calibration data from various languages proportionally to the language distribution of the model training datasets.
no code implementations • 27 Mar 2024 • Hongshen Xu, Zichen Zhu, Situo Zhang, Da Ma, Shuai Fan, Lu Chen, Kai Yu
Large Language Models (LLMs) often generate erroneous outputs, known as hallucinations, due to their limitations in discerning questions beyond their knowledge scope.
no code implementations • 20 Mar 2024 • Yu Xi, Hao Li, Baochen Yang, Haoyu Li, Hainan Xu, Kai Yu
Designing an efficient keyword spotting (KWS) system that delivers exceptional performance on resource-constrained edge devices has long been a subject of significant attention.
no code implementations • 5 Mar 2024 • Yutong Li, Lu Chen, Aiwei Liu, Kai Yu, Lijie Wen
In this work, we firstly focus on the independent literature summarization step and introduce ChatCite, an LLM agent with human workflow guidance for comparative literature summary.
1 code implementation • 28 Feb 2024 • Hongshen Xu, Ruisheng Cao, Su Zhu, Sheng Jiang, Hanchong Zhang, Lu Chen, Kai Yu
Previous work on spoken language understanding (SLU) mainly focuses on single-intent settings, where each input utterance merely contains one user intent.
1 code implementation • 28 Feb 2024 • Hongshen Xu, Lu Chen, Zihan Zhao, Da Ma, Ruisheng Cao, Zichen Zhu, Kai Yu
Additionally, we propose several pre-training tasks to model the interaction among text, structure, and image modalities effectively.
no code implementations • 22 Feb 2024 • Yiming Ai, Zhiwei He, Ziyin Zhang, Wenhong Zhu, Hongkun Hao, Kai Yu, Lingjun Chen, Rui Wang
In this study, we investigate the reliability of Large Language Models (LLMs) in professing human-like personality traits through responses to personality questionnaires.
no code implementations • 5 Feb 2024 • Zichen Zhu, Yang Xu, Lu Chen, Jingkai Yang, Yichuan Ma, Yiming Sun, Hailin Wen, Jiaqi Liu, Jinyu Cai, Yingzi Ma, Situo Zhang, Zihan Zhao, Liangtai Sun, Kai Yu
Rapid progress in multimodal large language models (MLLMs) highlights the need to introduce challenging yet realistic benchmarks to the academic community, while existing benchmarks primarily focus on understanding simple natural images and short context.
no code implementations • 26 Jan 2024 • Zihan Zhao, Da Ma, Lu Chen, Liangtai Sun, Zihao Li, Hongshen Xu, Zichen Zhu, Su Zhu, Shuai Fan, Guodong Shen, Xin Chen, Kai Yu
To this end, we develop ChemDFM, the first LLM towards CGI.
no code implementations • 25 Jan 2024 • Chenpeng Du, Yiwei Guo, Hankun Wang, Yifan Yang, Zhikang Niu, Shuai Wang, HUI ZHANG, Xie Chen, Kai Yu
Recent TTS models with decoder-only Transformer architecture, such as SPEAR-TTS and VALL-E, achieve impressive naturalness and demonstrate the ability for zero-shot adaptation given a speech prompt.
no code implementations • 12 Jan 2024 • Yu Xi, Baochen Yang, Hao Li, Jiaqi Guo, Kai Yu
Furthermore, experiments on the continuous speech dataset LibriSpeech demonstrate that, by incorporating audio discrimination, CLAD achieves significant performance gain over CL without audio discrimination.
no code implementations • 28 Dec 2023 • Biwen Lei, Kai Yu, Mengyang Feng, Miaomiao Cui, Xuansong Xie
Extensive experiments demonstrate that the proposed framework achieves excellent results in both domain adaptation and text-to-avatar tasks, outperforming existing methods in terms of generation quality and efficiency.
no code implementations • 14 Dec 2023 • Junjie Li, Yiwei Guo, Xie Chen, Kai Yu
Zero-shot voice conversion (VC) aims to transfer the source speaker timbre to arbitrary unseen target speaker timbre, while keeping the linguistic content unchanged.
no code implementations • 8 Dec 2023 • Mengyang Feng, Jinlin Liu, Kai Yu, Yuan YAO, Zheng Hui, Xiefan Guo, Xianhui Lin, Haolan Xue, Chen Shi, Xiaowen Li, Aojie Li, Xiaoyang Kang, Biwen Lei, Miaomiao Cui, Peiran Ren, Xuansong Xie
In this paper, we present DreaMoving, a diffusion-based controllable video generation framework to produce high-quality customized human videos.
no code implementations • 22 Nov 2023 • Kai Yu, Jinlin Liu, Mengyang Feng, Miaomiao Cui, Xuansong Xie
After the progressive training, the LoRA learns the 3D information of the generated object and eventually turns to an object-level 3D prior.
1 code implementation • 10 Nov 2023 • Matteo Zecchin, Kai Yu, Osvaldo Simeone
In ICL, a decision on a new input is made via a direct mapping of the input and of a few examples from the given task, serving as the task's context, to the output variable.
no code implementations • 3 Nov 2023 • Tao Liu, Chenpeng Du, Shuai Fan, Feilong Chen, Kai Yu
Our rigorous experiments comprehensively highlight that our ground-breaking approach outpaces existing methods with considerable margins and delivers seamless, intelligible videos in person-generic and multilingual scenarios.
no code implementations • 2 Nov 2023 • Hanglei Zhang, Yiwei Guo, Sen Liu, Xie Chen, Kai Yu
The LLM selects the best-matching style references from annotated utterances based on external style prompts, which can be raw input text or natural language style descriptions.
no code implementations • 28 Oct 2023 • Ruisheng Cao, Hanchong Zhang, Hongshen Xu, Jieyu Li, Da Ma, Lu Chen, Kai Yu
Text-to-SQL aims to generate an executable SQL program given the user utterance and the corresponding database schema.
1 code implementation • 26 Oct 2023 • Hanchong Zhang, Ruisheng Cao, Lu Chen, Hongshen Xu, Kai Yu
Recently Large Language Models (LLMs) have been proven to have strong abilities in various domains and tasks.
1 code implementation • 14 Sep 2023 • Yifan Yang, Feiyu Shen, Chenpeng Du, Ziyang Ma, Kai Yu, Daniel Povey, Xie Chen
Self-supervised learning (SSL) proficiency in speech-related tasks has driven research into utilizing discrete tokens for speech tasks like recognition and translation, which offer lower storage requirements and great potential to employ natural language processing techniques.
no code implementations • 10 Sep 2023 • Yiwei Guo, Chenpeng Du, Ziyang Ma, Xie Chen, Kai Yu
Although diffusion models in text-to-speech have become a popular choice due to their strong generative ability, the intrinsic complexity of sampling from diffusion models harms their efficiency.
1 code implementation • 25 Aug 2023 • Liangtai Sun, Yang Han, Zihan Zhao, Da Ma, Zhennan Shen, Baocai Chen, Lu Chen, Kai Yu
This design suffers from data leakage problem and lacks the evaluation of subjective Q/A ability.
1 code implementation • ICCV 2023 • Chun-Mei Feng, Kai Yu, Yong liu, Salman Khan, WangMeng Zuo
In this paper, we focus on a particular setting of learning adaptive prompts on the fly for each test sample from an unseen new domain, which is known as test-time prompt tuning (TPT).
1 code implementation • ICCV 2023 • Chun-Mei Feng, Kai Yu, Nian Liu, Xinxing Xu, Salman Khan, WangMeng Zuo
However, the performance of the global model is often hampered by non-i. i. d.
no code implementations • 25 Jun 2023 • Sen Liu, Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu
Although high-fidelity speech can be obtained for intralingual speech synthesis, cross-lingual text-to-speech (CTTS) is still far from satisfactory as it is difficult to accurately retain the speaker timbres(i. e. speaker similarity) and eliminate the accents from their first language(i. e. nativeness).
no code implementations • 16 Jun 2023 • Hanxue Zhang, Zeyu Xie, Xuenan Xu, Mengyue Wu, Kai Yu
Automated audio captioning (AAC) is an important cross-modality translation task, aiming at generating descriptions for audio clips.
no code implementations • 14 Jun 2023 • Zheng Liang, Zheshu Song, Ziyang Ma, Chenpeng Du, Kai Yu, Xie Chen
Recently, end-to-end (E2E) automatic speech recognition (ASR) models have made great strides and exhibit excellent performance in general speech recognition.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
1 code implementation • NeurIPS 2023 • Danyang Zhang, Lu Chen, Situo Zhang, Hongshen Xu, Zihan Zhao, Kai Yu
By equipping the LLM with a long-term experience memory, REMEMBERER is capable of exploiting the experiences from the past episodes even for different task goals, which excels an LLM-based agent with fixed exemplars or equipped with a transient working memory.
1 code implementation • 25 May 2023 • Hanchong Zhang, Jieyu Li, Lu Chen, Ruisheng Cao, Yunyan Zhang, Yu Huang, Yefeng Zheng, Kai Yu
Furthermore, we present CSS, a large-scale CrosS-Schema Chinese text-to-SQL dataset, to carry on corresponding studies.
1 code implementation • 23 May 2023 • Yiming Ai, Zhiwei He, Kai Yu, Rui Wang
Tense inconsistency frequently occurs in machine translation.
1 code implementation • NeurIPS 2023 • Guangyan Chen, Meiling Wang, Yi Yang, Kai Yu, Li Yuan, Yufeng Yue
Large language models (LLMs) based on the generative pre-training transformer (GPT) have demonstrated remarkable effectiveness across a diverse range of downstream tasks.
Ranked #3 on 3D Point Cloud Classification on ScanObjectNN (using extra training data)
1 code implementation • 14 May 2023 • Danyang Zhang, Hongshen Xu, Zihan Zhao, Lu Chen, Ruisheng Cao, Kai Yu
A GUI task set based on WikiHow app is collected on Mobile-Env to form a benchmark covering a range of GUI interaction capabilities.
no code implementations • 23 Apr 2023 • Zhijun Liu, Yiwei Guo, Kai Yu
In this work, we present DiffVoice, a novel text-to-speech model based on latent diffusion.
no code implementations • 30 Mar 2023 • Chenpeng Du, Qi Chen, Xie Chen, Kai Yu
Additionally, we propose a novel method for generating continuous video frames with the DDIM image decoder trained on individual frames, eliminating the need for modelling the joint distribution of consecutive frames directly.
no code implementations • 30 Jan 2023 • Meng Wang, Kai Yu, Chun-Mei Feng, Yiming Qian, Ke Zou, Lianyu Wang, Rick Siow Mong Goh, Yong liu, Huazhu Fu
To the best of our knowledge, our proposed RFedDis is the first work to develop an FL approach based on evidential uncertainty combined with feature disentangling, which enhances the performance and reliability of FL in non-IID domain features.
no code implementations • 12 Jan 2023 • Jieyu Li, Lu Chen, Ruisheng Cao, Su Zhu, Hongshen Xu, Zhi Chen, Hanchong Zhang, Kai Yu
Exploring the generalization of a text-to-SQL parser is essential for a system to automatically adapt the real-world databases.
1 code implementation • 5 Dec 2022 • Luofang Jiao, Kai Yu, Yunting Xu, Tianqi Zhang, Haibo Zhou, Xuemin, Shen
The uplink (UL)/downlink (DL) decoupled access has been emerging as a novel access architecture to improve the performance gains in cellular networks.
Spectral Efficiency Analysis of Uplink-Downlink Decoupled Access in C-V2X Networks
no code implementations • 1 Dec 2022 • Meng Wang, Kai Yu, Chun-Mei Feng, Ke Zou, Yanyu Xu, Qingquan Meng, Rick Siow Mong Goh, Yong liu, Huazhu Fu
Specifically, aiming at improving the model's ability to learn the complex pathological features of retinal edema lesions in OCT images, we develop a novel segmentation backbone that integrates a wavelet-enhanced feature extractor network and a multi-scale transformer module of our newly designed.
no code implementations • 17 Nov 2022 • Yiwei Guo, Chenpeng Du, Xie Chen, Kai Yu
Specifically, instead of being guided with a one-hot vector for the specified emotion, EmoDiff is guided with a soft label where the value of the specified emotion and \textit{Neutral} is set to $\alpha$ and $1-\alpha$ respectively.
2 code implementations • 8 Nov 2022 • Tao Liu, Kai Yu
DER is the primary metric to evaluate diarization performance while facing a dilemma: the errors in short utterances or segments tend to be overwhelmed by longer ones.
no code implementations • 10 Sep 2022 • Zhi Chen, Yuncong Liu, Lu Chen, Su Zhu, Mengyue Wu, Kai Yu
The second phase is to fine-tune the pretrained model on the TOD data.
no code implementations • 25 May 2022 • Zhi Chen, Jijia Bao, Lu Chen, Yuncong Liu, Da Ma, Bei Chen, Mengyue Wu, Su Zhu, Xin Dong, Fujiang Ge, Qingliang Miao, Jian-Guang Lou, Kai Yu
In this work, we aim to build a unified dialogue foundation model (DFM) which can be used to solve massive diverse dialogue tasks.
no code implementations • 24 May 2022 • Binwei Yao, Chao Shi, Likai Zou, Lingfeng Dai, Mengyue Wu, Lu Chen, Zhen Wang, Kai Yu
In a depression-diagnosis-directed clinical session, doctors initiate a conversation with ample emotional support that guides the patients to expose their symptoms based on clinical diagnosis criteria.
no code implementations • 23 May 2022 • Liangtai Sun, Xingyu Chen, Lu Chen, Tianle Dai, Zichen Zhu, Kai Yu
However, this API-based architecture greatly limits the information-searching capability of intelligent assistants and may even lead to task failure if TOD-specific APIs are not available or the task is too complicated to be executed by the provided APIs.
1 code implementation • NAACL 2022 • Zihan Zhao, Lu Chen, Ruisheng Cao, Hongshen Xu, Xingyu Chen, Kai Yu
Recently, the structural reading comprehension (SRC) task on web pages has attracted increasing research interests.
no code implementations • 29 Apr 2022 • Wen Wu, Mengyue Wu, Kai Yu
Automatic depression detection has attracted increasing amount of attention but remains a challenging task.
no code implementations • SIGDIAL (ACL) 2022 • Zhi Chen, Lu Chen, Bei Chen, Libo Qin, Yuncong Liu, Su Zhu, Jian-Guang Lou, Kai Yu
With the development of pre-trained language models, remarkable success has been witnessed in dialogue understanding (DU).
no code implementations • 2 Apr 2022 • Chenpeng Du, Yiwei Guo, Xie Chen, Kai Yu
The mainstream neural text-to-speech(TTS) pipeline is a cascade system, including an acoustic model(AM) that predicts acoustic feature from the input transcript and a vocoder that generates waveform according to the given acoustic feature.
no code implementations • 25 Mar 2022 • Siyu Lou, Xuenan Xu, Mengyue Wu, Kai Yu
Using pre-trained audio features and a descriptor-based aggregation method, we build our contextual audio-text retrieval system.
no code implementations • 15 Feb 2022 • Yiwei Guo, Chenpeng Du, Kai Yu
Although word-level prosody modeling in neural text-to-speech (TTS) has been investigated in recent research for diverse speech synthesis, it is still challenging to control speech synthesis manually without a specific reference.
no code implementations • 9 Dec 2021 • Su Zhu, Lu Chen, Ruisheng Cao, Zhi Chen, Qingliang Miao, Kai Yu
In this paper, we propose to improve prototypical networks with vector projection distance and abstract triangular Conditional Random Field (CRF) for the few-shot NLU.
1 code implementation • 3 Sep 2021 • Chun-Mei Feng, Yunlu Yan, Kai Yu, Yong Xu, Ling Shao, Huazhu Fu
Our SANet could explore the areas of high-intensity and low-intensity regions in the "forward" and "reverse" directions with the help of the auxiliary contrast, while learning clearer anatomical structure and edge information for the SR of a target-contrast MR image.
1 code implementation • DCASE Challenge 2021 • Xuenan Xu, Zeyu Xie, Mengyue Wu, Kai Yu
This report proposes an audio captioning system for the Detection and Classification of Acoustic Scenes and Events (DCASE) 2021 challenge task Task 6.
Ranked #2 on Audio captioning on Clotho (using extra training data)
no code implementations • Findings (ACL) 2021 • Zhi Chen, Lu Chen, Hanqi Li, Ruisheng Cao, Da Ma, Mengyue Wu, Kai Yu
A dual learning approach is also proposed for the utterance rewrite model to address the data sparsity problem.
1 code implementation • ACL 2021 • Ruisheng Cao, Lu Chen, Zhi Chen, Yanbin Zhao, Su Zhu, Kai Yu
This work aims to tackle the challenging heterogeneous graph encoding problem in the text-to-SQL task.
no code implementations • NAACL 2021 • Zhi Chen, Lu Chen, Yanbin Zhao, Ruisheng Cao, Zihan Xu, Su Zhu, Kai Yu
Given a database schema, Text-to-SQL aims to translate a natural language question into the corresponding SQL query.
no code implementations • 4 Mar 2021 • Kai Yu, Gong-De Guo, Song Lin
In this paper, we present a quantum algorithm and a quantum circuit to efficiently perform linear discriminant analysis (LDA) for dimensionality reduction.
Dimensionality Reduction Quantum Physics
1 code implementation • 25 Feb 2021 • Boer Lyu, Lu Chen, Su Zhu, Kai Yu
Additionally, we adopt the word lattice graph as input to maintain multi-granularity information.
2 code implementations • 1 Feb 2021 • Chenpeng Du, Kai Yu
Generating natural speech with diverse and smooth prosody pattern is a challenging task.
1 code implementation • EMNLP 2021 • Xingyu Chen, Zihan Zhao, Lu Chen, Danyang Zhang, Jiabao Ji, Ao Luo, Yuxuan Xiong, Kai Yu
In this paper, we introduce the task of structural reading comprehension (SRC) on web.
1 code implementation • 19 Jan 2021 • Heinrich Dinkel, Mengyue Wu, Kai Yu
Our model outperforms other approaches on the DCASE2018 and URBAN-SED datasets without requiring prior duration knowledge.
Data Augmentation Sound Event Detection Sound Audio and Speech Processing
no code implementations • 17 Jan 2021 • Jinye Peng, Jiaxin Wang, Jun Wang, Erlei Zhang, Qunxi Zhang, Yongqin Zhang, Xianlin Peng, Kai Yu
For the fine extraction stage, we design a new multiscale U-Net (MSU-Net) to effectively remove disease noise and refine the sketch.
no code implementations • 17 Jan 2021 • YingJie Xu, Kai Yu, Li Li, Xianfu Lei, Li Hao, Cheng-Xiang Wang
As a potential development direction of future transportation, the vacuum tube ultra-high-speed train (UHST) wireless communication systems have newly different channel characteristics from existing high-speed train (HST) scenarios.
no code implementations • 14 Oct 2020 • Zihan Zhao, Yuncong Liu, Lu Chen, Qi Liu, Rao Ma, Kai Yu
Recently, pre-trained language models like BERT have shown promising performance on multiple natural language processing tasks.
no code implementations • 22 Sep 2020 • Zhi Chen, Lu Chen, Zihan Xu, Yanbin Zhao, Su Zhu, Kai Yu
In dialogue systems, a dialogue state tracker aims to accurately find a compact representation of the current dialogue status, based on the entire dialogue history.
no code implementations • 22 Sep 2020 • Zhi Chen, Xiaoyuan Liu, Lu Chen, Kai Yu
A novel ComNet is proposed to model the structure of a hierarchical agent.
no code implementations • 22 Sep 2020 • Zhi Chen, Lu Chen, Xiang Zhou, Kai Yu
To the best of our knowledge, this is the first effort to optimize the DST module within DRL framework for on-line task-oriented spoken dialogue systems.
no code implementations • 22 Sep 2020 • Zhi Chen, Lu Chen, Yanbin Zhao, Su Zhu, Kai Yu
In task-oriented multi-turn dialogue systems, dialogue state refers to a compact representation of the user goal in the context of dialogue history.
no code implementations • 22 Sep 2020 • Zhi Chen, Lu Chen, Xiaoyuan Liu, Kai Yu
The task-oriented spoken dialogue system (SDS) aims to assist a human user in accomplishing a specific task (e. g., hotel booking).
1 code implementation • 21 Sep 2020 • Su Zhu, Ruisheng Cao, Lu Chen, Kai Yu
Few-shot slot tagging becomes appealing for rapid domain transfer and adaptation, motivated by the tremendous development of conversational dialogue systems.
no code implementations • 7 Sep 2020 • Chen Liu, Su Zhu, Lu Chen, Kai Yu
The framework consists of a slot tagging model and a rule-based value error recovery module.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +3
no code implementations • 31 Jul 2020 • Qi Liu, Tian Tan, Kai Yu
It is concluded that beta stabilizer parameters can reduce the sensitivity of learning rate with almost the same performance on DNN with relu activation function and LSTM.
no code implementations • 31 Jul 2020 • Qi Liu, Yanmin Qian, Kai Yu
For the speech recognition rescoring, although the proposed LSTM LM obtains very slight gains, the new model seems obtain the great complementary with the conventional LSTM LM.
no code implementations • ACL 2020 • Lu Chen, Yanbin Zhao, Boer Lyu, Lesheng Jin, Zhi Chen, Su Zhu, Kai Yu
Chinese short text matching usually employs word sequences rather than character sequences to get better performance.
no code implementations • ACL 2020 • Yanbin Zhao, Lu Chen, Zhi Chen, Ruisheng Cao, Su Zhu, Kai Yu
We also adopt graph attention networks with higher-order neighborhood information to encode the rich structure in AMR graphs.
no code implementations • 5 Jun 2020 • Ashwini Badgujar, Sheng Chen, Andrew Wang, Kai Yu, Paul Intrevado, David Guy Brizan
In this research, we continuously collect data from the RSS feeds of traditional news sources.
1 code implementation • ACL 2020 • Ruisheng Cao, Su Zhu, Chenyu Yang, Chen Liu, Rao Ma, Yanbin Zhao, Lu Chen, Kai Yu
One daunting problem for semantic parsing is the scarcity of annotation.
1 code implementation • 24 May 2020 • Chen Liu, Su Zhu, Zijian Zhao, Ruisheng Cao, Lu Chen, Kai Yu
In this paper, a novel BERT based SLU model (WCN-BERT SLU) is proposed to encode WCNs and the dialogue context jointly.
no code implementations • 30 Apr 2020 • Yanbin Zhao, Lu Chen, Zhi Chen, Kai Yu
When modeling simple and complex sentences with autoencoders, we introduce different types of noise into the training process.
2 code implementations • 26 Apr 2020 • Su Zhu, Ruisheng Cao, Kai Yu
The framework is composed of dual pseudo-labeling and dual learning method, which enables an NLU model to make full use of data (labeled and unlabeled) through a closed-loop of the primal and dual tasks.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Su Zhu, Jieyu Li, Lu Chen, Kai Yu
In this paper, a novel context and schema fusion network is proposed to encode the dialogue context and schema graph by using internal and external attention mechanisms.
Ranked #8 on Multi-domain Dialogue State Tracking on MULTIWOZ 2.0
Dialogue State Tracking Multi-domain Dialogue State Tracking
no code implementations • 3 Apr 2020 • Lu Chen, Boer Lv, Chi Wang, Su Zhu, Bowen Tan, Kai Yu
For multi-domain DST, the data sparsity problem is also a major obstacle due to the increased number of state candidates.
Ranked #12 on Multi-domain Dialogue State Tracking on MULTIWOZ 2.1
1 code implementation • 27 Mar 2020 • Heinrich Dinkel, Yefei Chen, Mengyue Wu, Kai Yu
We proposed two GPVAD models, one full (GPV-F), trained on 527 Audioset sound events, and one binary (GPV-B), only distinguishing speech and noise.
Sound Audio and Speech Processing
no code implementations • 22 Mar 2020 • Su Zhu, Zijian Zhao, Rao Ma, Kai Yu
The proposed approaches are evaluated on three datasets.
1 code implementation • IJCNLP 2019 • Zijian Zhao, Su Zhu, Kai Yu
The atomic templates produce exemplars for fine-grained constituents of semantic representations.
1 code implementation • ACL 2019 • Ruisheng Cao, Su Zhu, Chen Liu, Jieyu Li, Kai Yu
Semantic parsing converts natural language queries into structured logical forms.
no code implementations • 18 Jun 2019 • Xu Xiang, Shuai Wang, Houjun Huang, Yanmin Qian, Kai Yu
The proposed approach can achieve the state-of-the-art performance, with 25% ~ 30% equal error rate (EER) reduction on both tasks when compared to strong baselines using cross entropy loss with softmax, obtaining 2. 238% EER on VoxCeleb1 test set and 2. 761% EER on SITW core-core test set, respectively.
1 code implementation • 31 May 2019 • Xuenan Xu, Heinrich Dinkel, Mengyue Wu, Kai Yu
Captioning has attracted much attention in image and video understanding while a small amount of work examines audio captioning.
no code implementations • 27 May 2019 • Lu Chen, Zhi Chen, Bowen Tan, Sishan Long, Milica Gasic, Kai Yu
Experiments show that AgentGraph models significantly outperform traditional reinforcement learning approaches on most of the 18 tasks of the PyDial benchmark.
1 code implementation • 9 Apr 2019 • Zijian Zhao, Su Zhu, Kai Yu
In the paper, we focus on spoken language understanding from unaligned data whose annotation is a set of act-slot-value triples.
1 code implementation • 8 Apr 2019 • Heinrich Dinkel, Mengyue Wu, Kai Yu
Previous text-based depression detection is commonly based on large user-generated data.
1 code implementation • 8 Apr 2019 • Heinrich Dinkel, Kai Yu
Task 4 of the Dcase2018 challenge demonstrated that substantially more research is needed for a real-world application of sound event detection.
Sound Audio and Speech Processing
1 code implementation • 25 Feb 2019 • Mengyue Wu, Heinrich Dinkel, Kai Yu
A baseline encoder-decoder model is provided for both English and Mandarin.
no code implementations • 5 Nov 2018 • Xuankai Chang, Yanmin Qian, Kai Yu, Shinji Watanabe
The experiments demonstrate that the proposed methods can improve the performance of the end-to-end model in separating the overlapping speech and recognizing the separated streams.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • EMNLP 2018 • Liliang Ren, Kaige Xie, Lu Chen, Kai Yu
Dialogue state tracking is the core part of a spoken dialogue system.
no code implementations • 2 Aug 2018 • Zhehuai Chen, Yanmin Qian, Kai Yu
The few studies on sequence discriminative training for KWS are limited for fixed vocabulary or LVCSR based methods and have not been compared to the state-of-the-art deep learning based KWS approaches.
no code implementations • COLING 2018 • Lu Chen, Bowen Tan, Sishan Long, Kai Yu
The proposed structured deep reinforcement learning is based on graph neural networks (GNN), which consists of some sub-networks, each one for a node on a directed graph.
no code implementations • WS 2018 • Kaige Xie, Cheng Chang, Liliang Ren, Lu Chen, Kai Yu
Dialogue state tracking (DST), when formulated as a supervised learning problem, relies on labelled data.
no code implementations • NAACL 2018 • Xuan Liu, Di Cao, Kai Yu
Although excellent performance is obtained for large vocabulary tasks, tremendous memory consumption prohibits the use of LSTM LM in low-resource devices.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 3 Mar 2018 • Zhehuai Chen, Qi Liu, Hao Li, Kai Yu
Finally, modules are integrated into an acousticsto-word model (A2W) and jointly optimized using acoustic data to retain the advantage of sequence modeling.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • EMNLP 2017 • Cheng Chang, Runzhe Yang, Lu Chen, Xiang Zhou, Kai Yu
The key to building an evolvable dialogue system in real-world scenarios is to ensure an affordable on-line dialogue policy learning, which requires the on-line learning process to be safe, efficient and economical.
no code implementations • EMNLP 2017 • Lu Chen, Xiang Zhou, Cheng Chang, Runzhe Yang, Kai Yu
Hand-crafted rules and reinforcement learning (RL) are two popular choices to obtain dialogue policy.
no code implementations • WS 2018 • Su Zhu, Kai Yu
Concept definition is important in language understanding (LU) adaptation since literal definition difference can easily lead to data sparsity even if different data sets are actually semantically correlated.
no code implementations • EACL 2017 • Lu Chen, Runzhe Yang, Cheng Chang, Zihao Ye, Xiang Zhou, Kai Yu
On-line dialogue policy learning is the key for building evolvable conversational agent in real world scenarios.
no code implementations • 29 Nov 2016 • Kai Yu, Yang Zhou, Da Li, Zhang Zhang, Kaiqi Huang
Visual surveillance systems have become one of the largest data sources of Big Visual Data in real world.
no code implementations • 17 Nov 2016 • Kai Yu, Biao Leng, Zhang Zhang, Dangwei Li, Kaiqi Huang
Based on GoogLeNet, firstly, a set of mid-level attribute features are discovered by novelly designed detection layers, where a max-pooling based weakly-supervised object detection technique is used to train these layers with only image-level labels without the need of bounding box annotations of pedestrian attributes.
no code implementations • 6 Aug 2016 • Su Zhu, Kai Yu
This paper investigates the framework of encoder-decoder with attention for sequence labelling based spoken language understanding.
no code implementations • ICCV 2015 • Shangxuan Tian, Yifeng Pan, Chang Huang, Shijian Lu, Kai Yu, Chew Lim Tan
With character candidates detected by cascade boosting, the min-cost flow network model integrates the last three sequential steps into a single process which solves the error accumulation problem at both character level and text line level effectively.
1 code implementation • 19 Feb 2016 • Tianxing He, Yu Zhang, Jasha Droppo, Kai Yu
We propose to train bi-directional neural network language model(NNLM) with noise contrastive estimation(NCE).
25 code implementations • 9 Aug 2015 • Zhiheng Huang, Wei Xu, Kai Yu
It can also use sentence level tag information thanks to a CRF layer.
Ranked #1 on Named Entity Recognition (NER) on FindVehicle
no code implementations • 14 Jul 2015 • Kai Sun, Qizhe Xie, Kai Yu
Dialogue state tracking (DST) is a process to estimate the distribution of the dialogue states as a dialogue progresses.
no code implementations • CVPR 2015 • Jiajun Wu, Yinan Yu, Chang Huang, Kai Yu
The recent development in learning deep representations has demonstrated its wide applications in traditional vision tasks like classification and detection.
no code implementations • NeurIPS 2014 • Mu Li, David G. Andersen, Alexander J. Smola, Kai Yu
This paper describes a third-generation parameter server framework for distributed machine learning.
no code implementations • 26 Sep 2013 • Krishnakumar Balasubramanian, Kai Yu, Tong Zhang
The traditional convex formulation employs the group Lasso relaxation to achieve joint sparsity across tasks.
1 code implementation • 25 Dec 2012 • Chang Huang, Shenghuo Zhu, Kai Yu
Learning Mahanalobis distance metrics in a high- dimensional feature space is very difficult especially when structural sparsity and low rank are enforced to improve com- putational efficiency in testing phase.
no code implementations • NeurIPS 2010 • Yuanqing Lin, Tong Zhang, Shenghuo Zhu, Kai Yu
This paper proposes a principled extension of the traditional single-layer flat sparse coding scheme, where a two-layer coding scheme is derived based on theoretical analysis of nonlinear functional approximation that extends recent results for local coordinate coding.
no code implementations • NeurIPS 2009 • Kai Yu, Tong Zhang, Yihong Gong
This paper introduces a new method for semi-supervised learning on high dimensional nonlinear manifolds, which includes a phase of unsupervised basis learning and a phase of supervised function learning.
no code implementations • NeurIPS 2008 • Shenghuo Zhu, Kai Yu, Yihong Gong
Stochastic relational models provide a rich family of choices for learning and predicting dyadic data between two sets of entities.
no code implementations • NeurIPS 2008 • Kai Yu, Wei Xu, Yihong Gong
In this paper we focus on training deep neural networks for visual recognition tasks.
no code implementations • NeurIPS 2007 • Shenghuo Zhu, Kai Yu, Yihong Gong
It is becoming increasingly important to learn from a partially-observed random matrix and predict its missing elements.
no code implementations • NeurIPS 2007 • Kai Yu, Wei Chu
In this paper we develop a Gaussian process (GP) framework to model a collection of reciprocal random variables defined on the \emph{edges} of a network.