no code implementations • 16 Aug 2023 • Mengjie Du, Xiang Fang, Jie Li
This technical report describes ChinaTelecom system for Track 1 (closed) of the VoxCeleb2023 Speaker Recognition Challenge (VoxSRC 2023).
no code implementations • CVPR 2023 • Xiang Fang, Daizong Liu, Pan Zhou, Guoshun Nan
To handle the raw video bit-stream input, we propose a novel Three-branch Compressed-domain Spatial-temporal Fusion (TCSF) framework, which extracts and aggregates three kinds of low-level visual features (I-frame, motion vector and residual features) for effective and efficient grounding.
no code implementations • 5 Jan 2023 • Daizong Liu, Xiang Fang, Pan Zhou, Xing Di, Weining Lu, Yu Cheng
Given an untrimmed video, temporal sentence localization (TSL) aims to localize a specific segment according to a given sentence query.
no code implementations • 23 Sep 2022 • Xiang Fang, Daizong Liu, Pan Zhou, Yuchong Hu
In addition, due to the domain gap between different datasets, directly applying these pre-trained models to an unseen domain leads to a significant performance drop.
no code implementations • 31 Aug 2022 • Xiang Fang, Daizong Liu, Pan Zhou, Zichuan Xu, Ruixuan Li
To address this issue, in this paper, we propose a novel Hierarchical Local-Global Transformer (HLGT) to leverage this hierarchy information and model the interactions between different levels of granularity and different modalities for learning more fine-grained multi-modal representations.
no code implementations • 6 Mar 2022 • Daizong Liu, Xiang Fang, Wei Hu, Pan Zhou
Temporal sentence grounding aims to localize a target segment in an untrimmed video semantically according to a given sentence query.
1 code implementation • 23 Nov 2020 • Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu
Inspired by the variation and the heredity in genetics, V3H first decomposes each subspace into a variation matrix for the corresponding view and a heredity matrix for all the views to represent the unique information and the consistent information respectively.
1 code implementation • 20 Nov 2020 • Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu
In these scenarios, original image data often contain missing instances and noises, which is ignored by most multi-view clustering methods.
no code implementations • 20 Nov 2020 • Xiang Fang, Yuchong Hu
For the first self-weighted operation, it assigns different weights to different features by introducing an adaptive weight matrix, which can reinforce the role of the important features in the joint representation and make each graph robust.
1 code implementation • 20 Nov 2020 • Xiang Fang, Yuchong Hu, Pan Zhou, Dapeng Oliver Wu
However, different views often have distinct incompleteness, i. e., unbalanced incompleteness, which results in strong views (low-incompleteness views) and weak views (high-incompleteness views).
1 code implementation • 26 Jul 2019 • Ming Liu, Dongpeng Liu, Guangyu Sun, Yi Zhao, Duolin Wang, Fangxing Liu, Xiang Fang, Qing He, Dong Xu
Detecting inaccurate smart meters and targeting them for replacement can save significant resources.