1 code implementation • CVPR 2023 • Yu Wang, Pengchong Qiao, Chang Liu, Guoli Song, Xiawu Zheng, Jie Chen
We argue that an overlooked problem of robust SSL is its corrupted information on semantic level, practically limiting the development of the field.
1 code implementation • 10 Dec 2022 • Runyi Yu, Zhennan Wang, Yinhuai Wang, Kehan Li, Yian Zhao, Jian Zhang, Guoli Song, Jie Chen
By analyzing the input and output of each encoder layer in VTs using reparameterization and visualization, we find that the default PE joining method (simply adding the PE and patch embedding together) operates the same affine transformation to token embedding and PE, which limits the expressiveness of PE and hence constrains the performance of VTs.
4 code implementations • 21 Nov 2022 • Peng Jin, Jinfa Huang, Fenglin Liu, Xian Wu, Shen Ge, Guoli Song, David A. Clifton, Jie Chen
Most video-and-language representation learning approaches employ contrastive learning, e. g., CLIP, to project the video and text features into a common latent space according to the semantic similarities of text-video pairs.
Ranked #2 on Video Retrieval on LSMDC (text-to-video Mean Rank metric)
no code implementations • CVPR 2023 • Pengchong Qiao, Zhidan Wei, Yu Wang, Zhennan Wang, Guoli Song, Fan Xu, Xiangyang Ji, Chang Liu, Jie Chen
Semi-supervised learning (SSL) essentially pursues class boundary exploration with less dependence on human annotations.
no code implementations • CVPR 2023 • Kehan Li, Zhennan Wang, Zesen Cheng, Runyi Yu, Yian Zhao, Guoli Song, Chang Liu, Li Yuan, Jie Chen
Recently, self-supervised large-scale visual pre-training models have shown great promise in representing pixel-level semantic relationships, significantly promoting the development of unsupervised dense prediction tasks, e. g., unsupervised semantic segmentation (USS).
no code implementations • 21 Sep 2022 • Hao Li, Jinfa Huang, Peng Jin, Guoli Song, Qi Wu, Jie Chen
Under this setting, these 2D spatial reasoning approaches cannot distinguish the fine-grain spatial relations between visual objects and scene texts on the same image plane, thereby impairing the interpretability and performance of TextVQA models.
1 code implementation • 20 Jul 2022 • Kehan Li, Runyi Yu, Zhennan Wang, Li Yuan, Guoli Song, Jie Chen
Therefore, our locality guidance approach is very simple and efficient, and can serve as a basic performance enhancement method for VTs on tiny datasets.
no code implementations • 6 Jul 2022 • Zhennan Wang, Kehan Li, Runyi Yu, Yian Zhao, Pengchong Qiao, Chang Liu, Fan Xu, Xiangyang Ji, Guoli Song, Jie Chen
In this paper, we analyze batch normalization from the perspective of discriminability and find the disadvantages ignored by previous studies: the difference in $l_2$ norms of sample features can hinder batch normalization from obtaining more distinguished inter-class features and more compact intra-class features.
no code implementations • CVPR 2022 • Mengjun Cheng, Yipeng Sun, Longchao Wang, Xiongwei Zhu, Kun Yao, Jie Chen, Guoli Song, Junyu Han, Jingtuo Liu, Errui Ding, Jingdong Wang
Visual appearance is considered to be the most important cue to understand images for cross-modal retrieval, while sometimes the scene text appearing in images can provide valuable information to understand the visual semantics.
Ranked #10 on Cross-Modal Retrieval on Flickr30k (using extra training data)
2 code implementations • ICCV 2021 • Hongliang He, Zhongyi Huang, Yao Ding, Guoli Song, Lin Wang, Qian Ren, Pengxu Wei, Zhiqiang Gao, Jie Chen
Specifically, we define the centripetal direction feature as a class of adjacent directions pointing to the nuclear center to represent the spatial relationship between pixels within the nucleus.
1 code implementation • ACMMM 2019 • Yiling Wu, Shuhui Wang, Guoli Song, Qingming Huang
In this paper, we propose Self-Attention Embeddings (SAEM) to exploit fragment relations in images or texts by self-attention mechanism, and aggregate fragment information into visual and textual embeddings.
1 code implementation • 14 Aug 2019 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian
Multimodal learning aims to discover the relationship between multiple modalities.
no code implementations • ICCV 2017 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian
We incorporate the harmonization mechanism into the learning process of multimodal GPLVMs.
no code implementations • ICCV 2015 • Guoli Song, Shuhui Wang, Qingming Huang, Qi Tian
Data from real applications involve multiple modalities representing content with the same semantics and deliver rich information from complementary aspects.