no code implementations • COLING 2022 • Lisung Chen, Nuo Chen, Yuexian Zou, Yong Wang, Xinzhong Sun
Furthermore, we propose a threshold-free intent multi-intent classifier that utilizes the output of IND task and detects the multiple intents without depending on the threshold.
no code implementations • 14 Mar 2024 • Chris Kelly, Luhui Hu, Jiayin Hu, Yu Tian, Deshun Yang, Bang Yang, Cindy Yang, Zihao Li, Zaoshan Huang, Yuexian Zou
It seamlessly integrates various SOTA vision models and brings the automation in the selection of SOTA vision models, identifies the suitable 3D mesh creation algorithms corresponding to 2D depth maps analysis, generates optimal results based on diverse multimodal inputs such as text prompts.
no code implementations • 14 Mar 2024 • Chris Kelly, Luhui Hu, Bang Yang, Yu Tian, Deshun Yang, Cindy Yang, Zaoshan Huang, Zihao Li, Jiayin Hu, Yuexian Zou
With the emergence of large language models (LLMs) and vision foundation models, how to combine the intelligence and capacity of these open-sourced or API-available models to achieve open-world visual perception remains an open question.
no code implementations • 10 Mar 2024 • Deshun Yang, Luhui Hu, Yu Tian, Zihao Li, Chris Kelly, Bang Yang, Cindy Yang, Yuexian Zou
Several text-to-video diffusion models have demonstrated commendable capabilities in synthesizing high-quality video content.
no code implementations • 2 Mar 2024 • Chenchen Tao, Chong Wang, Yuexian Zou, Xiaohao Peng, Jiafei Wu, Jiangbo Qian
Most models for weakly supervised video anomaly detection (WS-VAD) rely on multiple instance learning, aiming to distinguish normal and abnormal snippets without specifying the type of anomaly.
no code implementations • 27 Feb 2024 • Bowen Cao, Deng Cai, Leyang Cui, Xuxin Cheng, Wei Bi, Yuexian Zou, Shuming Shi
To address this, we propose to initialize the training oracles using linguistic heuristics and, more importantly, bootstrap the oracles through iterative self-reinforcement.
1 code implementation • 30 Jan 2024 • Bang Yang, Yong Dai, Xuxin Cheng, Yaowei Li, Asif Raza, Yuexian Zou
To alleviate CF raised by covariate shift and lexical overlap, we further propose a novel approach that ensures the identical distribution of all token embeddings during initialization and regularizes token embedding learning during training.
no code implementations • 19 Nov 2023 • Xuxin Cheng, Bowen Cao, Qichen Ye, Zhihong Zhu, Hongxiang Li, Yuexian Zou
Specifically, in fine-tuning, we apply mutual learning and train two SLU models on the manual transcripts and the ASR transcripts, respectively, aiming to iteratively share knowledge between these two models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 16 Nov 2023 • Chris Kelly, Luhui Hu, Cindy Yang, Yu Tian, Deshun Yang, Bang Yang, Zaoshan Huang, Zihao Li, Yuexian Zou
In the current landscape of artificial intelligence, foundation models serve as the bedrock for advancements in both language and vision domains.
no code implementations • 25 Oct 2023 • Ji Jiang, Meng Cao, Tengtao Song, Long Chen, Yi Wang, Yuexian Zou
Video Referring Expression Comprehension (REC) aims to localize a target object in videos based on the queried natural language.
1 code implementation • 25 Aug 2023 • Bang Yang, Fenglin Liu, Xian Wu, YaoWei Wang, Xu sun, Yuexian Zou
To deal with the label shortage problem, we present a simple yet effective zero-shot approach MultiCapCLIP that can generate visual captions for different scenarios and languages without any labeled vision-caption pairs of downstream datasets.
1 code implementation • ICCV 2023 • Hongxiang Li, Meng Cao, Xuxin Cheng, Yaowei Li, Zhihong Zhu, Yuexian Zou
Due to two annoying issues in video grounding: (1) the co-existence of some visual entities in both ground truth and other moments, \ie semantic overlapping; (2) only a few moments in the video are annotated, \ie sparse annotation dilemma, vanilla contrastive learning is unable to model the correlations between temporally distant moments and learned inconsistent video representations.
no code implementations • 5 Jul 2023 • Bang Yang, Fenglin Liu, Zheng Li, Qingyu Yin, Chenyu You, Bing Yin, Yuexian Zou
We observe that the core challenges of novel product title generation are the understanding of novel product characteristics and the generation of titles in a novel writing style.
no code implementations • 9 Jun 2023 • Bang Yang, Asif Raza, Yuexian Zou, Tong Zhang
In this work, we propose customizing off-the-shelf general-purpose large-scale pre-trained models, i. e., foundation models (FMs), in computer vision and natural language processing with a specific focus on medical report generation.
no code implementations • 30 Mar 2023 • Hao An, Dongsheng Chen, Weiyuan Xu, Zhihong Zhu, Yuexian Zou
However, these methods are not able to fully leverage the trigger information and even bring noise to relation extraction.
3 code implementations • 30 Mar 2023 • Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang
To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.
Ranked #1 on Zero-Shot Environment Sound Classification on ESC-50 (using extra training data)
no code implementations • ICCV 2023 • Yaowei Li, Bang Yang, Xuxin Cheng, Zhihong Zhu, Hongxiang Li, Yuexian Zou
Automatic radiology report generation has attracted enormous research interest due to its practical value in reducing the workload of radiologists.
1 code implementation • 15 Mar 2023 • Ziyu Yao, Xuxin Cheng, Yuexian Zou
Moreover, we introduce a pose-level method, PoseRAC, which is based on this representation and achieves state-of-the-art performance on two new version datasets by using Pose Saliency Annotation to annotate salient poses for training.
Ranked #2 on Repetitive Action Counting on RepCount (using extra training data)
no code implementations • 12 Mar 2023 • Tengtao Song, Nuo Chen, Ji Jiang, Zhihong Zhu, Yuexian Zou
Since incorporating syntactic information like dependency structures into neural models can promote a better understanding of the sentences, such a method has been widely used in NLP tasks.
1 code implementation • 11 Mar 2023 • Bang Yang, Fenglin Liu, Yuexian Zou, Xian Wu, YaoWei Wang, David A. Clifton
We present the results of extensive experiments on twelve NLG tasks, showing that, without using any labeled downstream pairs for training, ZeroNLG generates high-quality and believable outputs and significantly outperforms existing zero-shot methods.
no code implementations • 10 Mar 2023 • Yifei Xin, Dongchao Yang, Fan Cui, Yujun Wang, Yuexian Zou
Existing weakly supervised sound event detection (WSSED) work has not explored both types of co-occurrences simultaneously, i. e., some sound events often co-occur, and their occurrences are usually accompanied by specific background sounds, so they would be inevitably entangled, causing misclassification and biased localization results with only clip-level supervision.
1 code implementation • 23 Feb 2023 • Qichen Ye, Bowen Cao, Nuo Chen, Weiyuan Xu, Yuexian Zou
Despite the promising result of recent KAQA systems which tend to integrate linguistic knowledge from pre-trained language models (PLM) and factual knowledge from knowledge graphs (KG) to answer complex questions, a bottleneck exists in effectively fusing the representations from PLMs and KGs because of (i) the semantic and distributional gaps between them, and (ii) the difficulties in joint reasoning over the provided knowledge from both modalities.
1 code implementation • 23 Feb 2023 • Bowen Cao, Qichen Ye, Weiyuan Xu, Yuexian Zou
Existing neighborhood aggregation strategies fail to capture either the short-term features or the long-term features of temporal graph attributes, leading to unsatisfactory model performance and even poor robustness and domain generality of the representation learning method.
no code implementations • 15 Jan 2023 • Hongxiang Li, Meng Cao, Xuxin Cheng, Zhihong Zhu, Yaowei Li, Yuexian Zou
Video grounding aims to locate a moment of interest matching the given query sentence from an untrimmed video.
no code implementations • CVPR 2023 • Meng Cao, Fangyun Wei, Can Xu, Xiubo Geng, Long Chen, Can Zhang, Yuexian Zou, Tao Shen, Daxin Jiang
Weakly-Supervised Video Grounding (WSVG) aims to localize events of interest in untrimmed videos with only video-level annotations.
no code implementations • 7 Dec 2022 • Xuxin Cheng, Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Yuexian Zou
How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)?
no code implementations • 22 Nov 2022 • Fenglin Liu, Xian Wu, Chenyu You, Shen Ge, Yuexian Zou, Xu sun
To this end, we introduce the unpaired video captioning task aiming to train models without coupled video-caption pairs in target language.
1 code implementation • 8 Nov 2022 • Zhihong Zhu, Weiyuan Xu, Xuxin Cheng, Tengtao Song, Yuexian Zou
Multi-intent detection and slot filling joint models are gaining increasing traction since they are closer to complicated real-world scenarios.
no code implementations • 28 Oct 2022 • Fenglin Liu, Xian Wu, Shen Ge, Xuancheng Ren, Wei Fan, Xu sun, Yuexian Zou
To enhance the correlation between vision and language in disentangled spaces, we introduce the visual concepts to DiMBERT which represent visual information in textual format.
no code implementations • 19 Oct 2022 • Fenglin Liu, Xuancheng Ren, Xian Wu, Wei Fan, Yuexian Zou, Xu sun
Especially for image captioning, the attention based models are expected to ground correct image regions with proper generated words.
1 code implementation • 6 Oct 2022 • Ji Jiang, Meng Cao, Tengtao Song, Yuexian Zou
To this end, we introduce two new datasets (i. e., VID-Entity and VidSTG-Entity) by augmenting the VIDSentence and VidSTG datasets with the explicitly referred words in the whole sentence, respectively.
1 code implementation • 21 Jul 2022 • Meng Cao, Ji Jiang, Long Chen, Yuexian Zou
Extensive experiments demonstrate that our DCNet achieves state-of-the-art performance on both video and image REC benchmarks.
1 code implementation • 21 Jul 2022 • Meng Cao, Tianyu Yang, Junwu Weng, Can Zhang, Jue Wang, Yuexian Zou
To further enhance the temporal reasoning ability of the learned feature, we propose a context projection head and a temporal aware contrastive loss to perceive the contextual relationships.
1 code implementation • 20 Jul 2022 • Dongchao Yang, Jianwei Yu, Helin Wang, Wen Wang, Chao Weng, Yuexian Zou, Dong Yu
In this study, we investigate generating sound conditioned on a text prompt and propose a novel text-to-sound generation framework that consists of a text encoder, a Vector Quantized Variational Autoencoder (VQ-VAE), a decoder, and a vocoder.
Ranked #13 on Audio Generation on AudioCaps
no code implementations • ACL 2021 • Fenglin Liu, Shen Ge, Yuexian Zou, Xian Wu
Medical report generation task, which targets to produce long and coherent descriptions of medical images, has attracted growing research interests recently.
1 code implementation • 5 Jun 2022 • Jinchuan Tian, Jianwei Yu, Chunlei Zhang, Chao Weng, Yuexian Zou, Dong Yu
Experiments conducted on Mandarin-English code-switched speech suggest that the proposed LAE is capable of discriminating different languages in frame-level and shows superior performance on both monolingual and multilingual ASR tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 3 May 2022 • Xinmeng Xu, Rongzhi Gu, Yuexian Zou
Hand-crafted spatial features, such as inter-channel intensity difference (IID) and inter-channel phase difference (IPD), play a fundamental role in recent deep learning based dual-microphone speech enhancement (DMSE) systems.
no code implementations • Findings (NAACL) 2022 • Chenyu You, Nuo Chen, Fenglin Liu, Shen Ge, Xian Wu, Yuexian Zou
To evaluate the capacity of SCQA systems in a dialogue-style interaction, we assemble a Spoken Conversational Question Answering (Spoken-CoQA) dataset with more than 40k question-answer pairs from 4k conversations.
Ranked #1 on Spoken Language Understanding on Spoken-SQuAD
no code implementations • 15 Apr 2022 • Zifeng Zhao, Rongzhi Gu, Dongchao Yang, Jinchuan Tian, Yuexian Zou
Dominant researches adopt supervised training for speaker extraction, while the scarcity of ideally clean corpus and channel mismatch problem are rarely considered.
no code implementations • 4 Apr 2022 • Zifeng Zhao, Dongchao Yang, Rongzhi Gu, Haoran Zhang, Yuexian Zou
However, its performance is often inferior to that of a blind source separation (BSS) counterpart with a similar network architecture, due to the auxiliary speaker encoder may sometimes generate ambiguous speaker embeddings.
no code implementations • 31 Mar 2022 • Liyu Wu, Can Zhang, Yuexian Zou
Inspired by the recent attention mechanism, we propose a multi-grain contextual focus module, termed MCF, to capture the action associated relation information from the body joints and parts.
no code implementations • 31 Mar 2022 • Li Wang, Rongzhi Gu, Weiji Zhuang, Peng Gao, Yujun Wang, Yuexian Zou
Bearing this in mind, a two-branch deep network (KWS branch and SV branch) with the same network structure is developed and a novel decoupling feature learning method is proposed to push up the performance of KWS and SV simultaneously where speaker-invariant keyword representations and keyword-invariant speaker representations are expected respectively.
1 code implementation • 29 Mar 2022 • Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu
However, the effectiveness and efficiency of the MBR-based methods are compromised: the MBR criterion is only used in system training, which creates a mismatch between training and decoding; the on-the-fly decoding process in MBR-based methods results in the need for pre-trained models and slow training speeds.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • CVPR 2022 • Can Zhang, Tianyu Yang, Junwu Weng, Meng Cao, Jue Wang, Yuexian Zou
These pre-trained models can be sub-optimal for temporal localization tasks due to the inherent discrepancy between video-level classification and clip-level localization.
1 code implementation • 6 Jan 2022 • Jinchuan Tian, Jianwei Yu, Chao Weng, Yuexian Zou, Dong Yu
Then, the LM score of the hypothesis is obtained by intersecting the generated lattice with an external word N-gram LM.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
1 code implementation • 5 Dec 2021 • Jinchuan Tian, Jianwei Yu, Chao Weng, Shi-Xiong Zhang, Dan Su, Dong Yu, Yuexian Zou
Recently, End-to-End (E2E) frameworks have achieved remarkable results on various Automatic Speech Recognition (ASR) tasks.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
1 code implementation • 30 Nov 2021 • Bang Yang, Tong Zhang, Yuexian Zou
DCD is an auxiliary task that requires a caption model to learn the correspondence between video content and concepts and the co-occurrence relations between concepts.
Ranked #16 on Video Captioning on MSR-VTT
no code implementations • 18 Sep 2021 • Dongsheng Chen, Zhiqi Huang, Xian Wu, Shen Ge, Yuexian Zou
Intent detection (ID) and Slot filling (SF) are two major tasks in spoken language understanding (SLU).
no code implementations • EMNLP 2021 • Meng Cao, Long Chen, Mike Zheng Shou, Can Zhang, Yuexian Zou
Almost all existing video grounding methods fall into two frameworks: 1) Top-down model: It predefines a set of segment candidates and then conducts segment classification and regression.
no code implementations • Findings (EMNLP) 2021 • Chenyu You, Nuo Chen, Yuexian Zou
In this paper, we propose novel training schemes for spoken question answering with a self-supervised training stage and a contrastive representation learning stage.
no code implementations • 26 Aug 2021 • Dongsheng Chen, Zhiqi Huang, Yuexian Zou
Spoken Language Understanding (SLU), including intent detection and slot filling, is a core component in human-computer interaction.
no code implementations • 25 Aug 2021 • Cong Wang, Yan Huang, Yuexian Zou, Yong Xu
However, it is noted that ASM-based SIDM degrades its performance in dehazing real world hazy images due to the limited modelling ability of ASM where the atmospheric light factor (ALF) and the angular scattering coefficient (ASC) are assumed as constants for one image.
no code implementations • 18 Aug 2021 • Lisong Chen, Peilin Zhou, Yuexian Zou
With the auxiliary knowledge provided by the MIL Intent Decoder, we set Final Slot Decoder as the teacher model that imparts knowledge back to Initial Slot Decoder to complete the loop.
no code implementations • 12 Aug 2021 • Li Wang, Rongzhi Gu, Nuo Chen, Yuexian Zou
Recently proposed metric learning approaches improved the generalizability of models for the KWS task, and 1D-CNN based KWS models have achieved the state-of-the-arts (SOTA) in terms of model size.
no code implementations • 12 Aug 2021 • Meng Cao, Can Zhang, Long Chen, Mike Zheng Shou, Yuexian Zou
In this paper, we analyze that the motion cues behind the optical flow features are complementary informative.
Optical Flow Estimation Weakly-supervised Temporal Action Localization +1
no code implementations • Findings (ACL) 2021 • Fenglin Liu, Xuancheng Ren, Xian Wu, Bang Yang, Shen Ge, Yuexian Zou, Xu sun
Video captioning combines video understanding and language generation.
no code implementations • 4 Jul 2021 • Zhiqi Huang, Fenglin Liu, Xian Wu, Shen Ge, Helin Wang, Wei Fan, Yuexian Zou
As a result, the proposed approach can handle various tasks including: Audio-Oriented Multimodal Machine Comprehension, Machine Reading Comprehension and Machine Listening Comprehension, in a single model, making fair comparisons possible between our model and the existing unimodal MC models.
no code implementations • 30 Jun 2021 • Liyu Wu, Yuexian Zou, Can Zhang
Efficient long-short temporal modeling is key for enhancing the performance of action recognition task.
no code implementations • 29 Jun 2021 • Ranyu Ning, Can Zhang, Yuexian Zou
Current mainstream one-stage TAD approaches localize and classify action proposals relying on pre-defined anchors, where the location and scale for action instances are set by designers.
no code implementations • 24 Jun 2021 • Meng Cao, Can Zhang, Dongming Yang, Yuexian Zou
Compared to the traditional single-stage segmentation network, our NASK conducts the detection in a coarse-to-fine manner with the first stage segmentation spotting the rectangle text proposals and the second one retrieving compact representations.
no code implementations • 20 Jun 2021 • Fenglin Liu, Meng Gao, Tianhao Zhang, Yuexian Zou
To further improve the quality of captions generated by the model, we propose the Semantic Relationship Explorer, which explores the relationships between semantic concepts for better understanding of the image.
no code implementations • Findings (ACL) 2021 • Fenglin Liu, Changchang Yin, Xian Wu, Shen Ge, Ping Zhang, Yuexian Zou, Xu sun
In addition, according to the analysis, the CA model can help existing models better attend to the abnormal regions and provide more accurate descriptions which are crucial for an interpretable diagnosis.
no code implementations • CVPR 2021 • Fenglin Liu, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou
In detail, PoKE explores the posterior knowledge, which provides explicit abnormal visual regions to alleviate visual data bias; PrKE explores the prior knowledge from the prior medical knowledge graph (medical knowledge) and prior radiology reports (working experience) to alleviate textual data bias.
no code implementations • 4 Jun 2021 • Nuo Chen, Chenyu You, Yuexian Zou
We also utilize the proposed self-supervised learning tasks to capture intra-sentence coherence.
no code implementations • 15 May 2021 • Fenglin Liu, Xuancheng Ren, Zhiyuan Zhang, Xu sun, Yuexian Zou
In this work, we investigate how the scale factors in the effectiveness of the skip connection and reveal that a trivial adjustment of the scale will lead to spurious gradient exploding or vanishing in line with the deepness of the models, which could be addressed by normalization, in particular, layer normalization, which induces consistent improvements over the plain skip connection.
no code implementations • 30 Apr 2021 • Dongming Yang, Yuexian Zou, Can Zhang, Meng Cao, Jie Chen
Upon the frame, an Interaction Intensifier Module and a Correlation Parsing Module are carefully designed, where: a) interactive semantics from humans can be exploited and passed to objects to intensify interactions, b) interactive correlations among humans, objects and interactions are integrated to promote predictions.
no code implementations • 8 Apr 2021 • Jinchuan Tian, Rongzhi Gu, Helin Wang, Yuexian Zou
Transformer-based self-supervised models are trained as feature extractors and have empowered many downstream speech tasks to achieve state-of-the-art performance.
no code implementations • 31 Mar 2021 • Helin Wang, Yuexian Zou, Wenwu Wang
In this paper, we present SpecAugment++, a novel data augmentation method for deep neural networks based acoustic scene classification (ASC).
1 code implementation • CVPR 2021 • Can Zhang, Meng Cao, Dongming Yang, Jie Chen, Yuexian Zou
In this paper, we argue that learning by comparing helps identify these hard snippets and we propose to utilize snippet Contrastive learning to Localize Actions, CoLA for short.
no code implementations • 21 Jan 2021 • Cong Wang, Yan Huang, Yuexian Zou, Yong Xu
However, for images taken in real-world, the illumination is not uniformly distributed over whole image which brings model mismatch and possibly results in color shift of the deep models using ASM.
no code implementations • 20 Dec 2020 • Nuo Chen, Fenglin Liu, Chenyu You, Peilin Zhou, Yuexian Zou
To predict the answer, it is common practice to employ a predictor to draw information only from the final encoder layer which generates the \textit{coarse-grained} representations of the source sequences, i. e., passage and question.
no code implementations • COLING 2020 • Zhiqi Huang, Fenglin Liu, Yuexian Zou
To this end, we propose a federated learning framework, which could unify various types of datasets as well as tasks to learn and fuse various types of knowledge, i. e., text representations, from different datasets and tasks, without the sharing of downstream task data.
no code implementations • COLING 2020 • Fenglin Liu, Xuancheng Ren, Zhiyuan Zhang, Xu sun, Yuexian Zou
In this work, we investigate how the scale factors in the effectiveness of the skip connection and reveal that a trivial adjustment of the scale will lead to spurious gradient exploding or vanishing in line with the deepness of the models, which could by addressed by normalization, in particular, layer normalization, which induces consistent improvements over the plain skip connection.
no code implementations • NeurIPS 2020 • Fenglin Liu, Xuancheng Ren, Xian Wu, Shen Ge, Wei Fan, Yuexian Zou, Xu sun
Especially for image captioning, the attention based models are expected to ground correct image regions with proper generated words.
no code implementations • 21 Oct 2020 • Chenyu You, Nuo Chen, Yuexian Zou
Spoken conversational question answering (SCQA) requires machines to model complex dialogue flow given the speech utterances and text corpora.
Audio Signal Processing Conversational Question Answering +2
no code implementations • 21 Oct 2020 • Chenyu You, Nuo Chen, Yuexian Zou
However, the recent work shows that ASR systems generate highly noisy transcripts, which critically limit the capability of machine comprehension on the SQA task.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +5
no code implementations • 18 Oct 2020 • Chenyu You, Nuo Chen, Fenglin Liu, Dongchao Yang, Yuexian Zou
In spoken question answering, QA systems are designed to answer questions from contiguous text spans within the related speech transcripts.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 28 Sep 2020 • Peilin Zhou, Zhiqi Huang, Fenglin Liu, Yuexian Zou
However, we noted that, so far, the efforts to obtain better performance by supporting bidirectional and explicit information exchange between ID and SF are not well studied. In addition, few studies attempt to capture the local context information to enhance the performance of SF.
2 code implementations • 8 Aug 2020 • Can Zhang, Yuexian Zou, Guang Chen, Lei Gan
In contrast to optical flow, our PA focuses more on distilling the motion information at boundaries.
Ranked #2 on Action Recognition on Jester (Gesture Recognition)
no code implementations • 14 Jul 2020 • Dongming Yang, Yuexian Zou
However, recent HOI detection methods mostly rely on additional annotations (e. g., human pose) and neglect powerful interactive reasoning beyond convolutions.
no code implementations • 26 Apr 2020 • Meng Cao, Yuexian Zou
Specifically, \textit{NASK} consists of a Text Instance Segmentation network namely \textit{TIS} (\(1^{st}\) stage), a Text RoI Pooling module and a Fiducial pOint eXpression module termed as \textit{FOX} (\(2^{nd}\) stage).
no code implementations • 16 Mar 2020 • Rongzhi Gu, Shi-Xiong Zhang, Yong Xu, Lian-Wu Chen, Yuexian Zou, Dong Yu
Target speech separation refers to extracting a target speaker's voice from an overlapped audio of simultaneous talkers.
no code implementations • 11 Mar 2020 • Dongming Yang, Yuexian Zou, Jian Zhang, Ge Li
GID block breaks through the local neighborhoods and captures long-range dependency of pixels both in global-level and instance-level from the scene to help detecting interactions between instances.
no code implementations • 9 Mar 2020 • Rongzhi Gu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu
Hand-crafted spatial features (e. g., inter-channel phase difference, IPD) play a fundamental role in recent deep learning based multi-channel speech separation (MCSS) methods.
no code implementations • 2 Jan 2020 • Rongzhi Gu, Yuexian Zou
To address these challenges, we propose a temporal-spatial neural filter, which directly estimates the target speech waveform from multi-speaker mixture in reverberant environments, assisted with directional information of the speaker(s).
no code implementations • 14 Dec 2019 • Helin Wang, Yuexian Zou, Dading Chong, Wenwu Wang
Convolutional neural networks (CNN) are one of the best-performing neural network architectures for environmental sound classification (ESC).
Acoustic Scene Classification Environmental Sound Classification +3
1 code implementation • 27 Nov 2019 • Bang Yang, Yuexian Zou, Fenglin Liu, Can Zhang
However, mainstream video captioning methods suffer from slow inference speed due to the sequential manner of autoregressive decoding, and prefer generating generic descriptions due to the insufficient training of visual words (e. g., nouns and verbs) and inadequate decoding paradigm.
no code implementations • 19 Aug 2019 • Dongming Yang, Yuexian Zou, Jian Zhang, Ge Li
Although two-stage detectors like Faster R-CNN achieved big successes in object detection due to the strategy of extracting region proposals by region proposal network, they show their poor adaption in real-world object detection as a result of without considering mining hard samples during extracting region proposals.
no code implementations • 15 May 2019 • Rongzhi Gu, Jian Wu, Shi-Xiong Zhang, Lian-Wu Chen, Yong Xu, Meng Yu, Dan Su, Yuexian Zou, Dong Yu
This paper extended the previous approach and proposed a new end-to-end model for multi-channel speech separation.
no code implementations • 17 Oct 2014 • Weiyang Liu, Zhiding Yu, Lijia Lu, Yandong Wen, Hui Li, Yuexian Zou
The LCD similarity measure can be kernelized under KCRC, which theoretically links CRC and LCD under the kernel method.