Search Results for author: Chiuman Ho

Found 7 papers, 2 papers with code

Catch Missing Details: Image Reconstruction with Frequency Augmented Variational Autoencoder

1 code implementation • CVPR 2023 • Xinmiao Lin, Yikang Li, Jenhao Hsiao, Chiuman Ho, Yu Kong

The popular VQ-VAE models reconstruct images through learning a discrete codebook but suffer from a significant issue in the rapid quality degradation of image reconstruction as the compression rate rises.

Image Generation Image Reconstruction

Paper
Code

VideoXum: Cross-modal Visual and Textural Summarization of Videos

1 code implementation • 21 Mar 2023 • Jingyang Lin, Hang Hua, Ming Chen, Yikang Li, Jenhao Hsiao, Chiuman Ho, Jiebo Luo

We propose a new joint video and text summarization task.

Ranked #1 on Video Summarization on videoxum

Text Summarization Video Summarization

Paper
Code

ROIFormer: Semantic-Aware Region of Interest Transformer for Efficient Self-Supervised Monocular Depth Estimation

no code implementations • 12 Dec 2022 • Daitao Xing, Jinglin Shen, Chiuman Ho, Anthony Tzes

The exploration of mutual-benefit cross-domains has shown great potential toward accurate self-supervised depth estimation.

Monocular Depth Estimation

Paper
Add Code

A Masked Bounding-Box Selection Based ResNet Predictor for Text Rotation Prediction

no code implementations • 6 Sep 2022 • Michael Yang, Yuan Lin, Chiuman Ho

The existing Optical Character Recognition (OCR) systems are capable of recognizing images with horizontal texts.

Optical Character Recognition Optical Character Recognition (OCR)

Paper
Add Code

Open Vocabulary Multi-Label Classification with Dual-Modal Decoder on Aligned Visual-Textual Features

no code implementations • 19 Aug 2022 • Shichao Xu, Yikang Li, Jenhao Hsiao, Chiuman Ho, Zhu Qi

In computer vision, multi-label recognition are important tasks with many real-world applications, but classifying previously unseen labels remains a significant challenge.

Ranked #1 on Multi-label zero-shot learning on ImageNet-1k to MSCOCO

Classification Decoder +2

Paper
Add Code

Dual-Flattening Transformers through Decomposed Row and Column Queries for Semantic Segmentation

no code implementations • 22 Jan 2022 • Ying Wang, Chiuman Ho, Wenju Xu, Ziwei Xuan, Xudong Liu, Guo-Jun Qi

We propose a Dual-Flattening Transformer (DFlatFormer) to enable high-resolution output by reducing complexity to $\mathcal{O}(hw(H+W))$ that is multiple orders of magnitude smaller than the naive dense transformer.

Semantic Segmentation

Paper
Add Code

GCF-Net: Gated Clip Fusion Network for Video Action Recognition

no code implementations • 2 Feb 2021 • Jenhao Hsiao, Jiawei Chen, Chiuman Ho

These models are trained by applying a deep CNN on single clip of fixed temporal length.

Action Recognition Temporal Action Localization

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.