Search Results for author: Wenhang Ge

Found 8 papers, 5 papers with code

LLM-Optic: Unveiling the Capabilities of Large Language Models for Universal Visual Grounding

no code implementations27 May 2024 Haoyu Zhao, Wenhang Ge, Ying-Cong Chen

LLM-Optic first employs an LLM as a Text Grounder to interpret complex text queries and accurately identify objects the user intends to locate.

Visual Grounding

SG-Adapter: Enhancing Text-to-Image Generation with Scene Graph Guidance

no code implementations24 May 2024 Guibao Shen, Luozhou Wang, Jiantao Lin, Wenhang Ge, Chaozhe Zhang, Xin Tao, Yuan Zhang, Pengfei Wan, Zhongyuan Wang, Guangyong Chen, Yijun Li, Ying-Cong Chen

In this paper, we introduce the Scene Graph Adapter(SG-Adapter), leveraging the structured representation of scene graphs to rectify inaccuracies in the original text embeddings.

Text-to-Image Generation

X-Ray: A Sequential 3D Representation for Generation

1 code implementation22 Apr 2024 Tao Hu, Wenhang Ge, Yuyang Zhao, Gim Hee Lee

In this paper, we introduce X-Ray, an innovative approach to 3D generation that employs a new sequential representation, drawing inspiration from the depth-revealing capabilities of X-Ray scans to meticulously capture both the external and internal features of objects.

3D Generation

Decompose and Realign: Tackling Condition Misalignment in Text-to-Image Diffusion Models

1 code implementation26 Jun 2023 Luozhou Wang, Guibao Shen, Wenhang Ge, Guangyong Chen, Yijun Li, Ying-Cong Chen

The ``Decompose'' phase separates conditions based on pair relationships, computing the result individually for each pair.

Image Generation

Ref-NeuS: Ambiguity-Reduced Neural Implicit Surface Learning for Multi-View Reconstruction with Reflection

1 code implementation ICCV 2023 Wenhang Ge, Tao Hu, Haoyu Zhao, Shu Liu, Ying-Cong Chen

We show that together with a reflection direction-dependent radiance, our model achieves high-quality surface reconstruction on reflective surfaces and outperforms the state-of-the-arts by a large margin.

3D Reconstruction Multi-View 3D Reconstruction +1

Text-Adaptive Multiple Visual Prototype Matching for Video-Text Retrieval

no code implementations27 Sep 2022 Chengzhi Lin, AnCong Wu, Junwei Liang, Jun Zhang, Wenhang Ge, Wei-Shi Zheng, Chunhua Shen

To address this problem, we propose a Text-Adaptive Multiple Visual Prototype Matching model, which automatically captures multiple prototypes to describe a video by adaptive aggregation of video token features.

Cross-Modal Retrieval Retrieval +2

Camera-Conditioned Stable Feature Generation for Isolated Camera Supervised Person Re-IDentification

1 code implementation CVPR 2022 Chao Wu, Wenhang Ge, AnCong Wu, Xiaobin Chang

To learn camera-view invariant features for person Re-IDentification (Re-ID), the cross-camera image pairs of each person play an important role.

Person Re-Identification

Cross-Camera Feature Prediction for Intra-Camera Supervised Person Re-identification across Distant Scenes

1 code implementation29 Jul 2021 Wenhang Ge, Chunyan Pan, AnCong Wu, Hongwei Zheng, Wei-Shi Zheng

To learn camera-invariant representation from cross-camera unpaired training data, we propose a cross-camera feature prediction method to mine cross-camera self supervision information from camera-specific feature distribution by transforming fake cross-camera positive feature pairs and minimize the distances of the fake pairs.

Person Re-Identification

Cannot find the paper you are looking for? You can Submit a new open access paper.