Search Results for author: Sanyuan Zhao

Found 8 papers, 3 papers with code

InteractiveVideo: User-Centric Controllable Video Generation with Synergistic Multimodal Instructions

1 code implementation • 5 Feb 2024 • Yiyuan Zhang, Yuhao Kang, Zhixin Zhang, Xiaohan Ding, Sanyuan Zhao, Xiangyu Yue

We introduce $\textit{InteractiveVideo}$, a user-centric framework for video generation.

119

Paper
Code

TextFormer: A Query-based End-to-End Text Spotter with Mixed Supervision

no code implementations • 6 Jun 2023 • Yukun Zhai, Xiaoqiang Zhang, Xiameng Qin, Sanyuan Zhao, Xingping Dong, Jianbing Shen

End-to-end text spotting is a vital computer vision task that aims to integrate scene text detection and recognition into a unified framework.

Decoder Scene Text Detection +2

Paper
Add Code

Generalized Few-Shot 3D Object Detection of LiDAR Point Cloud for Autonomous Driving

no code implementations • 8 Feb 2023 • Jiawei Liu, Xingping Dong, Sanyuan Zhao, Jianbing Shen

To achieve simultaneous detection for both common and rare objects, we propose a novel task, called generalized few-shot 3D object detection, where we have a large amount of training data for common (base) objects, but only a few data for rare (novel) classes.

3D Object Detection Autonomous Driving +1

Paper
Add Code

Bilateral Cross-Modality Graph Matching Attention for Feature Fusion in Visual Question Answering

1 code implementation • 14 Dec 2021 • JianJian Cao, Xiameng Qin, Sanyuan Zhao, Jianbing Shen

In this paper, we focus on these two problems and propose a Graph Matching Attention (GMA) network.

Graph Matching Question Answering +1

Paper
Code

Cross-Modality Person Re-Identification via Modality Confusion and Center Aggregation

no code implementations • ICCV 2021 • Xin Hao, Sanyuan Zhao, Mang Ye, Jianbing Shen

Cross-modality person re-identification is a challenging task due to large cross-modality discrepancy and intra-modality variations.

Cross-Modality Person Re-identification Person Re-Identification

Paper
Add Code

Self-Learning with Rectification Strategy for Human Parsing

no code implementations • CVPR 2020 • Tao Li, Zhiyuan Liang, Sanyuan Zhao, Jiahao Gong, Jianbing Shen

For the global error, we first transform category-wise features into a high-level graph model with coarse-grained structural information, and then decouple the high-level graph to reconstruct the category features.

Human Parsing Self-Learning

Paper
Add Code

Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection

1 code implementation • ECCV 2018 • Hongmei Song, Wenguan Wang, Sanyuan Zhao, Jianbing Shen, Kin-Man Lam

This paper proposes a fast video salient object detection model, based on a novel recurrent network architecture, named Pyramid Dilated Bidirectional ConvLSTM (PDB-ConvLSTM).

Ranked #1 on Video Salient Object Detection on UVSD (using extra training data)

Object object-detection +5

118

Paper
Code

Improved Face Detection and Alignment using Cascade Deep Convolutional Network

no code implementations • 28 Jul 2017 • Weilin Cong, Sanyuan Zhao, Hui Tian, Jianbing Shen

Real-world face detection and alignment demand an advanced discriminative model to address challenges by pose, lighting and expression.

Face Detection

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.