Search Results for author: Zhenyu He

Found 40 papers, 17 papers with code

Spatial-Temporal Multi-level Association for Video Object Segmentation

no code implementations • 9 Apr 2024 • Deshui Miao, Xin Li, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

In addition, we propose a spatial-temporal memory to assist feature association and temporal ID assignment and correlation.

Object Segmentation +3

Paper
Add Code

RTracker: Recoverable Tracking via PN Tree Structured Memory

1 code implementation • 28 Mar 2024 • Yuqing Huang, Xin Li, Zikun Zhou, YaoWei Wang, Zhenyu He, Ming-Hsuan Yang

Upon the PN tree memory, we develop corresponding walking rules for determining the state of the target and define a set of control flows to unite the tracker and the detector in different tracking scenarios.

Paper
Code

Do Efficient Transformers Really Save Computation?

no code implementations • 21 Feb 2024 • Kai Yang, Jan Ackermann, Zhenyu He, Guhao Feng, Bohang Zhang, Yunzhen Feng, Qiwei Ye, Di He, LiWei Wang

Our results show that while these models are expressive enough to solve general DP tasks, contrary to expectations, they require a model size that scales with the problem size.

Paper
Add Code

Two Stones Hit One Bird: Bilevel Positional Encoding for Better Length Extrapolation

no code implementations • 29 Jan 2024 • Zhenyu He, Guhao Feng, Shengjie Luo, Kai Yang, Di He, Jingjing Xu, Zhi Zhang, Hongxia Yang, LiWei Wang

In this work, we leverage the intrinsic segmentation of language sequences and design a new positional encoding method called Bilevel Positional Encoding (BiPE).

Disentanglement Position

Paper
Add Code

REST: Retrieval-Based Speculative Decoding

1 code implementation • 14 Nov 2023 • Zhenyu He, Zexuan Zhong, Tianle Cai, Jason D. Lee, Di He

We introduce Retrieval-Based Speculative Decoding (REST), a novel algorithm designed to speed up language model generation.

Language Modelling Retrieval +1

135

Paper
Code

Examining User-Friendly and Open-Sourced Large GPT Models: A Survey on Language, Multimodal, and Scientific GPT Models

1 code implementation • 27 Aug 2023 • Kaiyuan Gao, Sunan He, Zhenyu He, Jiacheng Lin, Qizhi Pei, Jie Shao, Wei zhang

Generative pre-trained transformer (GPT) models have revolutionized the field of natural language processing (NLP) with remarkable performance in various tasks and also extend their power to multimodal domains.

Paper
Code

Channel and Spatial Relation-Propagation Network for RGB-Thermal Semantic Segmentation

no code implementations • 24 Aug 2023 • Zikun Zhou, Shukun Wu, Guoqing Zhu, Hongpeng Wang, Zhenyu He

In this paper, we propose a Channel and Spatial Relation-Propagation Network (CSRPNet) for RGB-T semantic segmentation, which propagates only modality-shared information across different modalities and alleviates the modality-specific information contamination issue.

Ranked #12 on Thermal Image Segmentation on PST900

Relation Segmentation +2

Paper
Add Code

Cross-Modality Proposal-guided Feature Mining for Unregistered RGB-Thermal Pedestrian Detection

no code implementations • 23 Aug 2023 • Chao Tian, Zikun Zhou, Yuqing Huang, Gaojun Li, Zhenyu He

RGB-Thermal (RGB-T) pedestrian detection aims to locate the pedestrians in RGB-T image pairs to exploit the complementation between the two modalities for improving detection robustness in extreme conditions.

Data Augmentation Pedestrian Detection

Paper
Add Code

CiteTracker: Correlating Image and Text for Visual Tracking

1 code implementation • ICCV 2023 • Xin Li, Yuqing Huang, Zhenyu He, YaoWei Wang, Huchuan Lu, Ming-Hsuan Yang

Existing visual tracking methods typically take an image patch as the reference of the target to perform tracking.

Attribute Descriptive +2

Paper
Code

Transferable Decoding with Visual Entities for Zero-Shot Image Captioning

1 code implementation • ICCV 2023 • Junjie Fei, Teng Wang, Jinrui Zhang, Zhenyu He, Chengjie Wang, Feng Zheng

In this paper, we propose ViECap, a transferable decoding model that leverages entity-aware decoding to generate descriptions in both seen and unseen scenarios.

Caption Generation Hallucination +2

131

Paper
Code

ZeroPose: CAD-Model-based Zero-Shot Pose Estimation

no code implementations • 29 May 2023 • Jianqiu Chen, Mingshan Sun, Tianpeng Bao, Rui Zhao, Liwei Wu, Zhenyu He

In this paper, we present a CAD model-based zero-shot pose estimation pipeline called ZeroPose.

Instance Segmentation Object +3

Paper
Add Code

Reliability-Hierarchical Memory Network for Scribble-Supervised Video Object Segmentation

1 code implementation • 25 Mar 2023 • Zikun Zhou, Kaige Mao, Wenjie Pei, Hongpeng Wang, YaoWei Wang, Zhenyu He

To be specific, RHMNet first only uses the memory in the high-reliability level to locate the region with high reliability belonging to the target, which is highly similar to the initial target scribble.

Semantic Segmentation Video Object Segmentation +1

Paper
Code

Joint Visual Grounding and Tracking with Natural Language Specification

1 code implementation • CVPR 2023 • Li Zhou, Zikun Zhou, Kaige Mao, Zhenyu He

Such a separated framework overlooks the link between visual grounding and tracking, which is that the natural language descriptions provide global semantic cues for localizing the target for both two steps.

Ranked #3 on Visual Tracking on TNL2K

Visual Grounding Visual Tracking

Paper
Code

Audio2Gestures: Generating Diverse Gestures from Audio

no code implementations • 17 Jan 2023 • Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Linchao Bao, Zhenyu He

Finally, we demonstrate that our method can be readily used to generate motion sequences with user-specified motion clips on the timeline.

Gesture Generation

Paper
Add Code

Geo6D: Geometric Constraints Learning for 6D Pose Estimation

no code implementations • 20 Oct 2022 • Jianqiu Chen, Mingshan Sun, Ye Zheng, Tianpeng Bao, Zhenyu He, Donghai Li, Guoqiang Jin, Rui Zhao, Liwei Wu, Xiaoke Jiang

Numerous 6D pose estimation methods have been proposed that employ end-to-end regression to directly estimate the target pose parameters.

6D Pose Estimation object-detection +3

Paper
Add Code

How Image Generation Helps Visible-to-Infrared Person Re-Identification?

no code implementations • 4 Oct 2022 • Honghu Pan, Yongyong Chen, Yunqi He, Xin Li, Zhenyu He

To this end, we propose Flow2Flow, a unified framework that could jointly achieve training sample expansion and cross-modality image generation for V2I person ReID.

Image Generation Person Re-Identification

Paper
Add Code

Multi-Granularity Graph Pooling for Video-based Person Re-Identification

no code implementations • 23 Sep 2022 • Honghu Pan, Yongyong Chen, Zhenyu He

To downsample the graph, we propose a multi-head full attention graph pooling (MHFAPool) layer, which integrates the advantages of existing node clustering and node selection pooling methods.

Node Clustering Retrieval +2

Paper
Add Code

Towards Complete-View and High-Level Pose-based Gait Recognition

no code implementations • 23 Sep 2022 • Honghu Pan, Yongyong Chen, Tingyang Xu, Yunqi He, Zhenyu He

Extensive experiments on two large gait recognition datasets, i. e., CASIA-B and OUMVLP-Pose, demonstrate that our method outperforms the baseline model and existing pose-based methods by a large margin.

Gait Recognition Generative Adversarial Network +1

Paper
Add Code

Pose-Aided Video-based Person Re-Identification via Recurrent Graph Convolutional Network

no code implementations • 23 Sep 2022 • Honghu Pan, Qiao Liu, Yongyong Chen, Yunqi He, Yuan Zheng, Feng Zheng, Zhenyu He

Finally, we propose a dual-attention method consisting of node-attention and time-attention to obtain the temporal graph representation from the node embeddings, where the self-attention mechanism is employed to learn the importance of each node and each frame.

Retrieval Video-Based Person Re-Identification +1

Paper
Add Code

SSORN: Self-Supervised Outlier Removal Network for Robust Homography Estimation

no code implementations • 30 Aug 2022 • Yi Li, Wenjie Pei, Zhenyu He

In this paper, we attempt to build a deep learning model that mimics all four steps in the traditional homography estimation pipeline.

Denoising Homography Estimation

Paper
Add Code

Two-Stage Neural Contextual Bandits for Personalised News Recommendation

no code implementations • 26 Jun 2022 • Mengyan Zhang, Thanh Nguyen-Tang, Fangzhao Wu, Zhenyu He, Xing Xie, Cheng Soon Ong

We consider the problem of personalised news recommendation where each user consumes news in a sequential fashion.

Computational Efficiency Multi-Armed Bandits +2

Paper
Add Code

Global Tracking via Ensemble of Local Trackers

1 code implementation • CVPR 2022 • Zikun Zhou, Jianqiu Chen, Wenjie Pei, Kaige Mao, Hongpeng Wang, Zhenyu He

While it can exploit the temporal context like historical appearances and locations of the target, a potential limitation of such strategy is that the local tracker tends to misidentify a nearby distractor as the target instead of activating the re-detector when the real target is out of view.

Paper
Code

Skating-Mixer: Long-Term Sport Audio-Visual Modeling with MLPs

1 code implementation • 8 Mar 2022 • Jingfei Xia, Mingchen Zhuge, Tiantian Geng, Shun Fan, Yuantai Wei, Zhenyu He, Feng Zheng

Figure skating scoring is challenging because it requires judging the technical moves of the players as well as their coordination with the background music.

Representation Learning

Paper
Code

GuidedMix-Net: Semi-supervised Semantic Segmentation by Using Labeled Images as Reference

no code implementations • 28 Dec 2021 • Peng Tu, Yawen Huang, Feng Zheng, Zhenyu He, Liujun Cao, Ling Shao

In this paper, we propose a novel method for semi-supervised semantic segmentation named GuidedMix-Net, by leveraging labeled information to guide the learning of unlabeled instances.

Segmentation Semi-Supervised Semantic Segmentation

Paper
Add Code

Active Learning for Deep Visual Tracking

no code implementations • 17 Oct 2021 • Di Yuan, Xiaojun Chang, Yi Yang, Qiao Liu, Dehua Wang, Zhenyu He

In this paper, we propose an active learning method for deep visual tracking, which selects and annotates the unlabeled samples to train the deep CNNs model.

Active Learning Visual Tracking

Paper
Add Code

Audio2Gestures: Generating Diverse Gestures from Speech Audio with Conditional Variational Autoencoders

no code implementations • ICCV 2021 • Jing Li, Di Kang, Wenjie Pei, Xuefei Zhe, Ying Zhang, Zhenyu He, Linchao Bao

In order to overcome this problem, we propose a novel conditional variational autoencoder (VAE) that explicitly models one-to-many audio-to-motion mapping by splitting the cross-modal latent code into shared code and motion-specific code.

Ranked #3 on Gesture Generation on BEAT

Gesture Generation

Paper
Add Code

Saliency-Associated Object Tracking

1 code implementation • ICCV 2021 • Zikun Zhou, Wenjie Pei, Xin Li, Hongpeng Wang, Feng Zheng, Zhenyu He

A potential limitation of such trackers is that not all patches are equally informative for tracking.

Object Object Tracking

Paper
Code

Self-Supervised Tracking via Target-Aware Data Synthesis

no code implementations • 21 Jun 2021 • Xin Li, Wenjie Pei, YaoWei Wang, Zhenyu He, Huchuan Lu, Ming-Hsuan Yang

While deep-learning based tracking methods have achieved substantial progress, they entail large-scale and high-quality annotated data for sufficient training.

Representation Learning Self-Supervised Learning +1

Paper
Add Code

SiamCorners: Siamese Corner Networks for Visual Tracking

1 code implementation • 15 Apr 2021 • Kai Yang, Zhenyu He, Wenjie Pei, Zikun Zhou, Xin Li, Di Yuan, Haijun Zhang

By tracking a target as a pair of corners, we avoid the need to design the anchor boxes.

Region Proposal Visual Tracking

Paper
Code

TCDesc: Learning Topology Consistent Descriptors for Image Matching

no code implementations • 13 Sep 2020 • Honghu Pan, Fanyang Meng, Nana Fan, Zhenyu He

Our method has the following two advantages: (1) We are the first to consider neighborhood information of descriptors, while former works mainly focus on neighborhood consistency of feature points; (2) Our method can be applied in any former work of learning descriptors by triplet loss.

Paper
Add Code

LSOTB-TIR:A Large-Scale High-Diversity Thermal Infrared Object Tracking Benchmark

1 code implementation • 3 Aug 2020 • Qiao Liu, Xin Li, Zhenyu He, Chenglong Li, Jun Li, Zikun Zhou, Di Yuan, Jing Li, Kai Yang, Nana Fan, Feng Zheng

We evaluate and analyze more than 30 trackers on LSOTB-TIR to provide a series of baselines, and the results show that deep trackers achieve promising performance.

Thermal Infrared Object Tracking Vocal Bursts Intensity Prediction

115

Paper
Code

Accurate Bounding-box Regression with Distance-IoU Loss for Visual Tracking

no code implementations • 3 Jul 2020 • Di Yuan, Xiu Shu, Nana Fan, Xiaojun Chang, Qiao Liu, Zhenyu He

Moreover, we introduce a classification part that is trained online and optimized with a Conjugate-Gradient-based strategy to guarantee real-time tracking speed.

regression Visual Tracking

Paper
Add Code

TCDesc: Learning Topology Consistent Descriptors

no code implementations • 5 Jun 2020 • Honghu Pan, Fanyang Meng, Zhenyu He, Yongsheng Liang, Wei Liu

Then we define topology distance between descriptors as the difference of their topology vectors.

Paper
Add Code

Multi-Task Driven Feature Models for Thermal Infrared Tracking

1 code implementation • 26 Nov 2019 • Qiao Liu, Xin Li, Zhenyu He, Nana Fan, Di Yuan, Wei Liu, Yonsheng Liang

These two feature models are learned using a multi-task matching framework and are jointly optimized on the TIR tracking task.

Thermal Infrared Object Tracking

Paper
Code

Learning Deep Multi-Level Similarity for Thermal Infrared Object Tracking

1 code implementation • 9 Jun 2019 • Qiao Liu, Xin Li, Zhenyu He, Nana Fan, Di Yuan, Hongpeng Wang

These two similarities complement each other and hence enhance the discriminative capacity of the network for handling distractors.

Paper
Code

Target-Aware Deep Tracking

no code implementations • CVPR 2019 • Xin Li, Chao Ma, Baoyuan Wu, Zhenyu He, Ming-Hsuan Yang

Despite demonstrated successes for numerous vision tasks, the contributions of using pre-trained deep features for visual tracking are not as significant as that for object recognition.

Object Object Recognition +1

Paper
Add Code

Region-filtering Correlation Tracking

no code implementations • 23 Mar 2018 • Nana Fan, Zhenyu He

The IRs in training samples from cyclic shifts of the base training sample severely degrade the quality of a tracking model.

Visual Tracking

Paper
Add Code

PTB-TIR: A Thermal Infrared Pedestrian Tracking Benchmark

1 code implementation • 18 Jan 2018 • Qiao Liu, Zhenyu He, Xin Li, Yuan Zheng

The ability to evaluate the TIR pedestrian tracker fairly, on a benchmark dataset, is significant for the development of this field.

Attribute Thermal Infrared Object Tracking

Paper
Code

Hierarchical Spatial-aware Siamese Network for Thermal Infrared Object Tracking

1 code implementation • 27 Nov 2017 • Xin Li, Qiao Liu, Nana Fan, Zhenyu He, Hongzhi Wang

In this paper, we cast the TIR tracking problem as a similarity verification task, which is coupled well to the objective of the tracking task.

General Classification Thermal Infrared Object Tracking

Paper
Code

Deep Convolutional Neural Networks for Thermal Infrared Object Tracking

1 code implementation • Knowledge-Based Systems 2017 • QiaoLiu, Xiaohuan Lu, Zhenyu He, Chunkai Zhang, WenSheng Chen

We observe that the features from the fully-connected layer are not suitable for thermal infrared tracking due to the lack of spatial information of the target, while the features from the convolution layers are.

Object Thermal Infrared Object Tracking +2

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.