2 code implementations • 9 Apr 2024 • Ming Tao, Bing-Kun Bao, Hao Tang, YaoWei Wang, Changsheng Xu
3) The story visualization and continuation models are trained and inferred independently, which is not user-friendly.
no code implementations • 27 Sep 2023 • Penghang Yu, Zhiyi Tan, Guanming Lu, Bing-Kun Bao
It maps the discrete behavior data into a continuous latent space, and generates behaviors with the guidance of collaborative signals and user multimodal preference.
no code implementations • 30 Aug 2023 • Mingjie Qiu, Zhiyi Tan, Bing-Kun Bao
To be specific, in the proposed MSGNN model, we first devise a novel graph learning module, which directly captures long-range connectivity from trans-regional epidemic signals and integrates them into a multi-scale graph.
2 code implementations • 7 Aug 2023 • Penghang Yu, Zhiyi Tan, Guanming Lu, Bing-Kun Bao
Meanwhile, a behavior-aware fuser is designed to comprehensively model user preferences by adaptively learning the relative importance of different modality features.
2 code implementations • CVPR 2023 • Ming Tao, Bing-Kun Bao, Hao Tang, Changsheng Xu
The complex scene understanding ability of CLIP enables the discriminator to accurately assess the image quality.
Ranked #3 on Text-to-Image Generation on CUB
no code implementations • CVPR 2023 • Sisi You, Hantao Yao, Bing-Kun Bao, Changsheng Xu
Recently, Multiple Object Tracking has achieved great success, which consists of object detection, feature embedding, and identity association.
1 code implementation • 2 Jun 2022 • Ming Tao, Bing-Kun Bao, Hao Tang, Fei Wu, Longhui Wei, Qi Tian
To solve these limitations, we propose: (i) a Dynamic Editing Block (DEBlock) which composes different editing modules dynamically for various editing requirements.
1 code implementation • 10 Jul 2021 • Jianyu Wang, Bing-Kun Bao, Changsheng Xu
However, existing graph-based methods fail to perform multi-step reasoning well, neglecting two properties of VideoQA: (1) Even for the same video, different questions may require different amount of video clips or objects to infer the answer with relational reasoning; (2) During reasoning, appearance and motion features have complicated interdependence which are correlated and complementary to each other.
Ranked #29 on Visual Question Answering (VQA) on MSRVTT-QA
3 code implementations • CVPR 2022 • Ming Tao, Hao Tang, Fei Wu, Xiao-Yuan Jing, Bing-Kun Bao, Changsheng Xu
To these ends, we propose a simpler but more effective Deep Fusion Generative Adversarial Networks (DF-GAN).
Ranked #4 on Text-to-Image Generation on CUB (Inception score metric)