Search Results for author: Haoyu Zhen

Found 5 papers, 2 papers with code

RapVerse: Coherent Vocals and Whole-Body Motions Generations from Text

no code implementations • 30 May 2024 • Jiaben Chen, Xin Yan, Yihang Chen, Siyuan Cen, Qinwei Ma, Haoyu Zhen, Kaizhi Qian, Lie Lu, Chuang Gan

In this work, we introduce a challenging task for simultaneously generating 3D holistic body motions and singing vocals directly from textual lyrics inputs, advancing beyond existing works that typically address these two modalities in isolation.

Paper
Add Code

3D-VLA: A 3D Vision-Language-Action Generative World Model

no code implementations • 14 Mar 2024 • Haoyu Zhen, Xiaowen Qiu, Peihao Chen, Jincheng Yang, Xin Yan, Yilun Du, Yining Hong, Chuang Gan

Recent vision-language-action (VLA) models rely on 2D inputs, lacking integration with the broader realm of the 3D physical world.

Language Modelling Large Language Model +1

Paper
Add Code

CHORD: Category-level Hand-held Object Reconstruction via Shape Deformation

no code implementations • ICCV 2023 • Kailin Li, Lixin Yang, Haoyu Zhen, Zenan Lin, Xinyu Zhan, Licheng Zhong, Jian Xu, Kejian Wu, Cewu Lu

This can be attributed to the fact that humans have mastered the shape prior of the 'mug' category, and can quickly establish the corresponding relations between different mug instances and the prior, such as where the rim and handle are located.

Object Reconstruction

Paper
Add Code

Color-NeuS: Reconstructing Neural Implicit Surfaces with Color

1 code implementation • 14 Aug 2023 • Licheng Zhong, Lixin Yang, Kailin Li, Haoyu Zhen, Mei Han, Cewu Lu

Mesh is extracted from the signed distance function (SDF) network for the surface, and color for each surface vertex is drawn from the global color network.

133

Paper
Code

3D-LLM: Injecting the 3D World into Large Language Models

5 code implementations • NeurIPS 2023 • Yining Hong, Haoyu Zhen, Peihao Chen, Shuhong Zheng, Yilun Du, Zhenfang Chen, Chuang Gan

Furthermore, experiments on our held-in datasets for 3D captioning, task composition, and 3D-assisted dialogue show that our model outperforms 2D VLMs.

Ranked #4 on 3D Question Answering (3D-QA) on ScanQA Test w/ objects

3D Object Captioning 3D Question Answering (3D-QA) +3

809

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.