1 code implementation • CVPR 2023 • Xinmiao Lin, Yikang Li, Jenhao Hsiao, Chiuman Ho, Yu Kong
The popular VQ-VAE models reconstruct images through learning a discrete codebook but suffer from a significant issue in the rapid quality degradation of image reconstruction as the compression rate rises.
1 code implementation • 21 Mar 2023 • Jingyang Lin, Hang Hua, Ming Chen, Yikang Li, Jenhao Hsiao, Chiuman Ho, Jiebo Luo
We propose a new joint video and text summarization task.
Ranked #1 on Video Summarization on videoxum
no code implementations • 12 Dec 2022 • Daitao Xing, Jinglin Shen, Chiuman Ho, Anthony Tzes
The exploration of mutual-benefit cross-domains has shown great potential toward accurate self-supervised depth estimation.
no code implementations • 6 Sep 2022 • Michael Yang, Yuan Lin, Chiuman Ho
The existing Optical Character Recognition (OCR) systems are capable of recognizing images with horizontal texts.
Optical Character Recognition Optical Character Recognition (OCR)
no code implementations • 19 Aug 2022 • Shichao Xu, Yikang Li, Jenhao Hsiao, Chiuman Ho, Zhu Qi
In computer vision, multi-label recognition are important tasks with many real-world applications, but classifying previously unseen labels remains a significant challenge.
no code implementations • 22 Jan 2022 • Ying Wang, Chiuman Ho, Wenju Xu, Ziwei Xuan, Xudong Liu, Guo-Jun Qi
We propose a Dual-Flattening Transformer (DFlatFormer) to enable high-resolution output by reducing complexity to $\mathcal{O}(hw(H+W))$ that is multiple orders of magnitude smaller than the naive dense transformer.
no code implementations • 2 Feb 2021 • Jenhao Hsiao, Jiawei Chen, Chiuman Ho
These models are trained by applying a deep CNN on single clip of fixed temporal length.