Search Results for author: Haoyu Ma

Found 36 papers, 15 papers with code

MaskINT: Video Editing via Interpolative Non-autoregressive Masked Transformers

no code implementations • 19 Dec 2023 • Haoyu Ma, Shahin Mahdizadehaghdam, Bichen Wu, Zhipeng Fan, YuChao Gu, Wenliang Zhao, Lior Shapira, Xiaohui Xie

Recent advances in generative AI have significantly enhanced image and video editing, particularly in the context of text prompt control.

Video Editing

Paper
Add Code

Instance Tracking in 3D Scenes from Egocentric Videos

1 code implementation • 7 Dec 2023 • Yunhan Zhao, Haoyu Ma, Shu Kong, Charless Fowlkes

We explore this problem by first introducing a new benchmark dataset, consisting of RGB and depth videos, per-frame camera pose, and instance-level annotations in both 2D camera and 3D world coordinates.

Human-Object Interaction Detection Object Tracking

Paper
Code

CVTHead: One-shot Controllable Head Avatar with Vertex-feature Transformer

1 code implementation • 11 Nov 2023 • Haoyu Ma, Tong Zhang, Shanlin Sun, Xiangyi Yan, Kun Han, Xiaohui Xie

Reconstructing personalized animatable head avatars has significant implications in the fields of AR/VR.

Neural Rendering

Paper
Code

HarmonyDream: Task Harmonization Inside World Models

no code implementations • 30 Sep 2023 • Haoyu Ma, Jialong Wu, Ningya Feng, Chenjun Xiao, Dong Li, Jianye Hao, Jianmin Wang, Mingsheng Long

Model-based reinforcement learning (MBRL) holds the promise of sample-efficient learning by utilizing a world model, which models how the environment works and typically encompasses components for two tasks: observation modeling and reward modeling.

Ranked #4 on Atari Games 100k on Atari 100k

Atari Games 100k Model-based Reinforcement Learning +1

Paper
Add Code

Light Field Diffusion for Single-View Novel View Synthesis

no code implementations • 20 Sep 2023 • Yifeng Xiong, Haoyu Ma, Shanlin Sun, Kun Han, Hao Tang, Xiaohui Xie

Starting from the camera pose matrices, LFD transforms them into light field encoding, with the same shape as the reference image, to describe the direction of each ray.

Denoising Novel View Synthesis +1

Paper
Add Code

Hybrid-CSR: Coupling Explicit and Implicit Shape Representation for Cortical Surface Reconstruction

no code implementations • 23 Jul 2023 • Shanlin Sun, Thanh-Tung Le, Chenyu You, Hao Tang, Kun Han, Haoyu Ma, Deying Kong, Xiangyi Yan, Xiaohui Xie

We present Hybrid-CSR, a geometric deep-learning model that combines explicit and implicit shape representations for cortical surface reconstruction.

Surface Reconstruction

Paper
Add Code

Pre-training Contextualized World Models with In-the-wild Videos for Reinforcement Learning

1 code implementation • NeurIPS 2023 • Jialong Wu, Haoyu Ma, Chaoyi Deng, Mingsheng Long

To tackle this issue, we introduce Contextualized World Models (ContextWM) that explicitly separate context and dynamics modeling to overcome the complexity and diversity of in-the-wild videos and facilitate knowledge transfer between distinct scenes.

Autonomous Driving Decoder +4

Paper
Code

Hybrid Transformer and CNN Attention Network for Stereo Image Super-resolution

no code implementations • 9 May 2023 • Ming Cheng, Haoyu Ma, Qiufang Ma, Xiaopeng Sun, Weiqi Li, Zhenyu Zhang, Xuhan Sheng, Shijie Zhao, Junlin Li, Li Zhang

Multi-stage strategies are frequently employed in image restoration tasks.

Data Augmentation Image Enhancement +2

Paper
Add Code

OPDN: Omnidirectional Position-aware Deformable Network for Omnidirectional Image Super-Resolution

no code implementations • 26 Apr 2023 • Xiaopeng Sun, Weiqi Li, Zhenyu Zhang, Qiufang Ma, Xuhan Sheng, Ming Cheng, Haoyu Ma, Shijie Zhao, Jian Zhang, Junlin Li, Li Zhang

Model A aims to enhance the feature extraction ability of 360{\deg} image positional information, while Model B further focuses on the high-frequency information of 360{\deg} images.

Image Super-Resolution Position

Paper
Add Code

Localized Region Contrast for Enhancing Self-Supervised Learning in Medical Image Segmentation

no code implementations • 6 Apr 2023 • Xiangyi Yan, Junayed Naushad, Chenyu You, Hao Tang, Shanlin Sun, Kun Han, Haoyu Ma, James Duncan, Xiaohui Xie

In this paper, we propose a novel contrastive learning framework that integrates Localized Region Contrast (LRC) to enhance existing self-supervised pre-training methods for medical image segmentation.

Contrastive Learning Image Segmentation +5

Paper
Add Code

I2F: A Unified Image-to-Feature Approach for Domain Adaptive Semantic Segmentation

no code implementations • 3 Jan 2023 • Haoyu Ma, Xiangru Lin, Yizhou Yu

This paper proposes a novel UDA pipeline for semantic segmentation that unifies image-level and feature-level adaptation.

Segmentation Semantic Segmentation +1

Paper
Add Code

Peeling the Onion: Hierarchical Reduction of Data Redundancy for Efficient Vision Transformer Training

1 code implementation • 19 Nov 2022 • Zhenglun Kong, Haoyu Ma, Geng Yuan, Mengshu Sun, Yanyue Xie, Peiyan Dong, Xin Meng, Xuan Shen, Hao Tang, Minghai Qin, Tianlong Chen, Xiaolong Ma, Xiaohui Xie, Zhangyang Wang, Yanzhi Wang

Vision transformers (ViTs) have recently obtained success in many applications, but their intensive computation and heavy memory usage at both training and inference time limit their generalization.

Paper
Code

Identity-Aware Hand Mesh Estimation and Personalization from RGB Images

1 code implementation • 22 Sep 2022 • Deying Kong, Linguang Zhang, Liangjian Chen, Haoyu Ma, Xiangyi Yan, Shanlin Sun, Xingwei Liu, Kun Han, Xiaohui Xie

In this paper, we propose an identity-aware hand mesh estimation model, which can incorporate the identity information represented by the intrinsic shape parameters of the subject.

Paper
Code

PPT: token-Pruned Pose Transformer for monocular and multi-view human pose estimation

1 code implementation • 16 Sep 2022 • Haoyu Ma, Zhe Wang, Yifei Chen, Deying Kong, Liangjian Chen, Xingwei Liu, Xiangyi Yan, Hao Tang, Xiaohui Xie

In this paper, we propose the token-Pruned Pose Transformer (PPT) for 2D human pose estimation, which can locate a rough human mask and performs self-attention only within selected tokens.

Ranked #17 on 3D Human Pose Estimation on Human3.6M (using extra training data)

2D Human Pose Estimation 3D Human Pose Estimation

Paper
Code

Auto-ViT-Acc: An FPGA-Aware Automatic Acceleration Framework for Vision Transformer with Mixed-Scheme Quantization

no code implementations • 10 Aug 2022 • Zhengang Li, Mengshu Sun, Alec Lu, Haoyu Ma, Geng Yuan, Yanyue Xie, Hao Tang, Yanyu Li, Miriam Leeser, Zhangyang Wang, Xue Lin, Zhenman Fang

Compared with state-of-the-art ViT quantization work (algorithmic approach only without hardware acceleration), our quantization achieves 0. 47% to 1. 36% higher Top-1 accuracy under the same bit-width.

Quantization

Paper
Add Code

Training Your Sparse Neural Network Better with Any Mask

1 code implementation • 26 Jun 2022 • Ajay Jaiswal, Haoyu Ma, Tianlong Chen, Ying Ding, Zhangyang Wang

Pruning large neural networks to create high-quality, independently trainable sparse masks, which can maintain similar performance to their dense counterparts, is very desirable due to the reduced space and time complexity.

Paper
Code

NTIRE 2022 Challenge on Super-Resolution and Quality Enhancement of Compressed Video: Dataset, Methods and Results

2 code implementations • 20 Apr 2022 • Ren Yang, Radu Timofte, Meisong Zheng, Qunliang Xing, Minglang Qiao, Mai Xu, Lai Jiang, Huaida Liu, Ying Chen, Youcheng Ben, Xiao Zhou, Chen Fu, Pei Cheng, Gang Yu, Junyi Li, Renlong Wu, Zhilu Zhang, Wei Shang, Zhengyao Lv, Yunjin Chen, Mingcai Zhou, Dongwei Ren, Kai Zhang, WangMeng Zuo, Pavel Ostyakov, Vyal Dmitry, Shakarim Soltanayev, Chervontsev Sergey, Zhussip Magauiya, Xueyi Zou, Youliang Yan, Pablo Navarrete Michelini, Yunhua Lu, Diankai Zhang, Shaoli Liu, Si Gao, Biao Wu, Chengjian Zheng, Xiaofeng Zhang, Kaidi Lu, Ning Wang, Thuong Nguyen Canh, Thong Bach, Qing Wang, Xiaopeng Sun, Haoyu Ma, Shijie Zhao, Junlin Li, Liangbin Xie, Shuwei Shi, Yujiu Yang, Xintao Wang, Jinjin Gu, Chao Dong, Xiaodi Shi, Chunmei Nian, Dong Jiang, Jucai Lin, Zhihuai Xie, Mao Ye, Dengyan Luo, Liuhan Peng, Shengjie Chen, Qian Wang, Xin Liu, Boyang Liang, Hang Dong, Yuhao Huang, Kai Chen, Xingbei Guo, Yujing Sun, Huilei Wu, Pengxu Wei, Yulin Huang, Junying Chen, Ik Hyun Lee, Sunder Ali Khowaja, Jiseok Yoon

This challenge includes three tracks.

Super-Resolution

Paper
Code

Diffeomorphic Image Registration with Neural Velocity Field

no code implementations • 25 Feb 2022 • Kun Han, Shanlin Sun, Xiangyi Yan, Chenyu You, Hao Tang, Junayed Naushad, Haoyu Ma, Deying Kong, Xiaohui Xie

Here we propose a new optimization-based method named DNVF (Diffeomorphic Image Registration with Neural Velocity Field) which utilizes deep neural network to model the space of admissible transformations.

Image Registration

Paper
Add Code

Sparsity Winning Twice: Better Robust Generalization from More Efficient Training

1 code implementation • ICLR 2022 • Tianlong Chen, Zhenyu Zhang, Pengjun Wang, Santosh Balachandra, Haoyu Ma, Zehao Wang, Zhangyang Wang

We introduce two alternatives for sparse adversarial training: (i) static sparsity, by leveraging recent results from the lottery ticket hypothesis to identify critical sparse subnetworks arising from the early training; (ii) dynamic sparsity, by allowing the sparse subnetwork to adaptively adjust its connectivity pattern (while sticking to the same sparsity ratio) throughout training.

Paper
Code

VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer

no code implementations • 17 Jan 2022 • Mengshu Sun, Haoyu Ma, Guoliang Kang, Yifan Jiang, Tianlong Chen, Xiaolong Ma, Zhangyang Wang, Yanzhi Wang

To the best of our knowledge, this is the first time quantization has been incorporated into ViT acceleration on FPGAs with the help of a fully automatic framework to guide the quantization strategy on the software side and the accelerator implementations on the hardware side given the target frame rate.

Quantization

Paper
Add Code

EI-CLIP: Entity-Aware Interventional Contrastive Learning for E-Commerce Cross-Modal Retrieval

no code implementations • CVPR 2022 • Haoyu Ma, Handong Zhao, Zhe Lin, Ajinkya Kale, Zhangyang Wang, Tong Yu, Jiuxiang Gu, Sunav Choudhary, Xiaohui Xie

recommendation, and marketing services.

Causal Inference Contrastive Learning +3

Paper
Add Code

Over-the-Air Aggregation for Federated Learning: Waveform Superposition and Prototype Validation

no code implementations • 27 Oct 2021 • Huayan Guo, Yifan Zhu, Haoyu Ma, Vincent K. N. Lau, Kaibin Huang, Xiaofan Li, Huabin Nong, Mingyu Zhou

In this paper, we develop an orthogonal-frequency-division-multiplexing (OFDM)-based over-the-air (OTA) aggregation solution for wireless federated learning (FL).

Federated Learning

Paper
Add Code

AFTer-UNet: Axial Fusion Transformer UNet for Medical Image Segmentation

no code implementations • 20 Oct 2021 • Xiangyi Yan, Hao Tang, Shanlin Sun, Haoyu Ma, Deying Kong, Xiaohui Xie

One has to either downsample the image or use cropped local patches to reduce GPU memory usage, which limits its performance.

Decoder Image Segmentation +4

Paper
Add Code

TransFusion: Cross-view Fusion with Transformer for 3D Human Pose Estimation

1 code implementation • 18 Oct 2021 • Haoyu Ma, Liangjian Chen, Deying Kong, Zhe Wang, Xingwei Liu, Hao Tang, Xiangyi Yan, Yusheng Xie, Shih-Yao Lin, Xiaohui Xie

The 3D position encoding guided by the epipolar field provides an efficient way of encoding correspondences between pixels of different views.

Ranked #20 on 3D Human Pose Estimation on Human3.6M (using extra training data)

3D Human Pose Estimation 3D Pose Estimation

Paper
Code

Stingy Teacher: Sparse Logits Suffice to Fail Knowledge Distillation

no code implementations • 29 Sep 2021 • Haoyu Ma, Yifan Huang, Tianlong Chen, Hao Tang, Chenyu You, Zhangyang Wang, Xiaohui Xie

However, it is unclear why the distorted distribution of the logits is catastrophic to the student model.

Knowledge Distillation

Paper
Add Code

SGE net: Video object detection with squeezed GRU and information entropy map

no code implementations • 14 Jun 2021 • Rui Su, Wenjing Huang, Haoyu Ma, Xiaowei Song, Jinglu Hu

Compared with object detection of static images, video object detection is more challenging due to the motion of objects, while providing rich temporal information.

Object object-detection +1

Paper
Add Code

Undistillable: Making A Nasty Teacher That CANNOT teach students

1 code implementation • ICLR 2021 • Haoyu Ma, Tianlong Chen, Ting-Kuei Hu, Chenyu You, Xiaohui Xie, Zhangyang Wang

Knowledge Distillation (KD) is a widely used technique to transfer knowledge from pre-trained teacher models to (usually more lightweight) student models.

Knowledge Distillation

Paper
Code

Coarse-to-Fine Domain Adaptive Semantic Segmentation with Photometric Alignment and Category-Center Regularization

no code implementations • CVPR 2021 • Haoyu Ma, Xiangru Lin, Zifeng Wu, Yizhou Yu

Unsupervised domain adaptation (UDA) in semantic segmentation is a fundamental yet promising task relieving the need for laborious annotation works.

Ranked #23 on Synthetic-to-Real Translation on SYNTHIA-to-Cityscapes

Segmentation Semantic Segmentation +2

Paper
Add Code

Spending Your Winning Lottery Better After Drawing It

1 code implementation • 8 Jan 2021 • Ajay Kumar Jaiswal, Haoyu Ma, Tianlong Chen, Ying Ding, Zhangyang Wang

In this paper, we demonstrate that it is unnecessary for spare retraining to strictly inherit those properties from the dense network.

Knowledge Distillation

Paper
Code

SIA-GCN: A Spatial Information Aware Graph Neural Network with 2D Convolutions for Hand Pose Estimation

no code implementations • 25 Sep 2020 • Deying Kong, Haoyu Ma, Xiaohui Xie

In this paper, we extend GNNs along two directions: a) allowing features at each node to be represented by 2D spatial confidence maps instead of 1D vectors; and b) proposing an efficient operation to integrate information from neighboring nodes through 2D convolutions with different learnable kernels at each edge.

Hand Pose Estimation

Paper
Add Code

Real-MFF: A Large Realistic Multi-focus Image Dataset with Ground Truth

no code implementations • 28 Mar 2020 • Juncheng Zhang, Qingmin Liao, Shaojun Liu, Haoyu Ma, Wenming Yang, Jing-Hao Xue

In this letter, we introduce a large and realistic multi-focus dataset called Real-MFF, which contains 710 pairs of source images with corresponding ground truth images.

Paper
Add Code

Rotation-invariant Mixed Graphical Model Network for 2D Hand Pose Estimation

no code implementations • 5 Feb 2020 • Deying Kong, Haoyu Ma, Yifei Chen, Xiaohui Xie

In this paper, we propose a new architecture named Rotation-invariant Mixed Graphical Model Network (R-MGMN) to solve the problem of 2D hand pose estimation from a monocular RGB image.

Hand Pose Estimation

Paper
Add Code

Nonparametric Structure Regularization Machine for 2D Hand Pose Estimation

1 code implementation • 24 Jan 2020 • Yifei Chen, Haoyu Ma, Deying Kong, Xiangyi Yan, Jianbao Wu, Wei Fan, Xiaohui Xie

We propose a novel Nonparametric Structure Regularization Machine (NSRM) for 2D hand pose estimation, adopting a cascade multi-task architecture to learn hand structure and keypoint representations jointly.

Hand Pose Estimation

102

Paper
Code

An α-Matte Boundary Defocus Model Based Cascaded Network for Multi-focus Image Fusion

2 code implementations • 29 Oct 2019 • Haoyu Ma, Qingmin Liao, Juncheng Zhang, Shaojun Liu, Jing-Hao Xue

Based on this {\alpha}-matte defocus model and the generated data, a cascaded boundary aware convolutional network termed MMF-Net is proposed and trained, aiming to achieve clearer fusion results around the FDB.

Paper
Code

Adaptive Graphical Model Network for 2D Handpose Estimation

1 code implementation • 18 Sep 2019 • Deying Kong, Yifei Chen, Haoyu Ma, Xiangyi Yan, Xiaohui Xie

In this paper, we propose a new architecture called Adaptive Graphical Model Network (AGMN) to tackle the task of 2D hand pose estimation from a monocular RGB image.

Hand Pose Estimation

Paper
Code

Boundary Aware Multi-Focus Image Fusion Using Deep Neural Network

no code implementations • 30 Mar 2019 • Haoyu Ma, Juncheng Zhang, Shaojun Liu, Qingmin Liao

Since it is usually difficult to capture an all-in-focus image of a 3D scene directly, various multi-focus image fusion methods are employed to generate it from several images focusing at different depths.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.