Search Results for author: Yimu Wang

Found 11 papers, 4 papers with code

HaVTR: Improving Video-Text Retrieval Through Augmentation Using Large Foundation Models

no code implementations • 7 Apr 2024 • Yimu Wang, Shuai Yuan, Xiangru Jian, Wei Pang, Mushi Wang, Ning Yu

While recent progress in video-text retrieval has been driven by the exploration of powerful model architectures and training strategies, the representation learning ability of video-text retrieval models is still limited due to low-quality and scarce training data annotations.

Hallucination Representation Learning +3

Paper
Add Code

Pretext Training Algorithms for Event Sequence Data

no code implementations • 16 Feb 2024 • Yimu Wang, He Zhao, Ruizhi Deng, Frederick Tung, Greg Mori

Pretext training followed by task-specific fine-tuning has been a successful approach in vision and language domains.

Contrastive Learning

Paper
Add Code

InvGC: Robust Cross-Modal Retrieval by Inverse Graph Convolution

1 code implementation • 20 Oct 2023 • Xiangru Jian, Yimu Wang

However, a recent study shows that multi-modal data representations tend to cluster within a limited convex cone (as representation degeneration problem), which hinders retrieval performance due to the inseparability of these representations.

Cross-Modal Retrieval Retrieval

Paper
Code

Balance Act: Mitigating Hubness in Cross-Modal Retrieval with Query and Gallery Banks

1 code implementation • 17 Oct 2023 • Yimu Wang, Xiangru Jian, Bo Xue

In this work, we present a post-processing solution to address the hubness problem in cross-modal retrieval, a phenomenon where a small number of gallery data points are frequently retrieved, resulting in a decline in retrieval performance.

Cross-Modal Retrieval Retrieval

Paper
Code

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

no code implementations • 5 Sep 2023 • TaeHoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-Jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh, Jonghwan Mun, Solgil Oh, Kenan Emir Ak, Gwang-Gook Lee, Yan Xu, Mingwei Shen, Kyomin Hwang, Wonsik Shin, Kamin Lee, Wonhark Park, Dongkwan Lee, Nojun Kwak, Yujin Wang, Yimu Wang, Tiancheng Gu, Xingchang Lv, Mingmao Sun

In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge.

Fairness Image Captioning

Paper
Add Code

Gradient-Based Word Substitution for Obstinate Adversarial Examples Generation in Language Models

no code implementations • 24 Jul 2023 • Yimu Wang, Peng Shi, Hongyang Zhang

Furthermore, to show the transferability of obstinate word substitutions found by GradObstinate, we replace the words in four representative NLP benchmarks with their obstinate substitutions.

Memorization MRPC +1

Paper
Add Code

Cooperation or Competition: Avoiding Player Domination for Multi-Target Robustness via Adaptive Budgets

no code implementations • CVPR 2023 • Yimu Wang, Dinghuai Zhang, Yihan Wu, Heng Huang, Hongyang Zhang

We identify a phenomenon named player domination in the bargaining game, namely that the existing max-based approaches, such as MAX and MSD, do not converge.

Paper
Add Code

Video-Text Retrieval by Supervised Sparse Multi-Grained Learning

1 code implementation • 19 Feb 2023 • Yimu Wang, Peng Shi

While recent progress in video-text retrieval has been advanced by the exploration of better representation learning, in this paper, we present a novel multi-grained sparse learning framework, S3MA, to learn an aligned sparse space shared between the video and the text for video-text retrieval.

Ranked #16 on Video Retrieval on MSR-VTT-1kA

Representation Learning Retrieval +3

Paper
Code

Multimodal Federated Learning via Contrastive Representation Ensemble

1 code implementation • 17 Feb 2023 • Qiying Yu, Yang Liu, Yimu Wang, Ke Xu, Jingjing Liu

In this work, we propose Contrastive Representation Ensemble and Aggregation for Multimodal FL (CreamFL), a multimodal federated learning framework that enables training larger server models from clients with heterogeneous model architectures and data modalities, while only communicating knowledge on public dataset.

Federated Learning Question Answering +3

Paper
Code

Multi-View Fusion Transformer for Sensor-Based Human Activity Recognition

no code implementations • 16 Feb 2022 • Yimu Wang, Kun Yu, Yan Wang, Hui Xue

In this paper, to extract a better feature for advancing the performance, we propose a novel method, namely multi-view fusion transformer (MVFT) along with a novel attention mechanism.

Human Activity Recognition Time Series +1

Paper
Add Code

Nearly Optimal Regret for Stochastic Linear Bandits with Heavy-Tailed Payoffs

no code implementations • 28 Apr 2020 • Bo Xue, Guanghui Wang, Yimu Wang, Lijun Zhang

In this paper, we study the problem of stochastic linear bandits with finite action sets.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.