6 code implementations • 15 Nov 2023 • Fuxiao Liu, Xiaoyang Wang, Wenlin Yao, Jianshu Chen, Kaiqiang Song, Sangwoo Cho, Yaser Yacoob, Dong Yu
Recognizing the need for a comprehensive evaluation of LMM chart understanding, we also propose a MultiModal Chart Benchmark (\textbf{MMC-Benchmark}), a comprehensive human-annotated benchmark with nine distinct tasks evaluating reasoning capabilities over charts.
6 code implementations • 23 Oct 2023 • Tianrui Guan, Fuxiao Liu, Xiyang Wu, Ruiqi Xian, Zongxia Li, Xiaoyu Liu, Xijun Wang, Lichang Chen, Furong Huang, Yaser Yacoob, Dinesh Manocha, Tianyi Zhou
Our comprehensive case studies within HallusionBench shed light on the challenges of hallucination and illusion in LVLMs.
Ranked #1 on Visual Question Answering (VQA) on HallusionBench
4 code implementations • 26 Jun 2023 • Fuxiao Liu, Kevin Lin, Linjie Li, JianFeng Wang, Yaser Yacoob, Lijuan Wang
To efficiently measure the hallucination generated by LMMs, we propose GPT4-Assisted Visual Instruction Evaluation (GAVIE), a stable approach to evaluate visual instruction tuning like human experts.
Ranked #4 on Visual Question Answering (VQA) on HallusionBench
no code implementations • 15 Feb 2023 • Trevine Oorloff, Yaser Yacoob
While recent research has progressively overcome the low-resolution constraint of one-shot face video re-enactment with the help of StyleGAN's high-fidelity portrait generation, these approaches rely on at least one of the following: explicit 2D/3D priors, optical flow based warping as motion descriptors, off-the-shelf encoders, etc., which constrain their performance (e. g., inconsistent predictions, inability to capture fine facial details and accessories, poor generalization, artifacts).
1 code implementation • 15 Feb 2023 • Fuxiao Liu, Yaser Yacoob, Abhinav Shrivastava
We introduce a new benchmark, COVID-VTS, for fact-checking multi-modal information involving short-duration videos with COVID19- focused information from both the real world and machine generation.
no code implementations • ICCV 2023 • Trevine Oorloff, Yaser Yacoob
Addressing these limitations, we propose a novel framework exploiting the implicit 3D prior and inherent latent properties of StyleGAN2 to facilitate one-shot face re-enactment at 1024x1024 (1) with zero dependencies on explicit structural priors, (2) accommodating attribute edits, and (3) robust to diverse facial expressions and head poses of the source frame.
2 code implementations • 28 Mar 2022 • Trevine Oorloff, Yaser Yacoob
To this end, we propose an end-to-end expressive face video encoding approach that facilitates data-efficient high-quality video re-synthesis by optimizing low-dimensional edits of a single Identity-latent.
no code implementations • CVPR 2018 • Hao Zhou, Jin Sun, Yaser Yacoob, David W. Jacobs
We propose to train a deep Convolutional Neural Network (CNN) to regress lighting parameters from a single face image.
no code implementations • CVPR 2018 • Hao Zhou, Jin Sun, Yaser Yacoob, David W. Jacobs
We propose to train a deep Convolutional Neural Network (CNN) to regress lighting parameters from a single face image.
no code implementations • 18 Dec 2015 • Yaser Yacoob
This paper considers the intra-image color-space of an object or a scene when these are subject to a dominant single-source of variation.