no code implementations • 20 Mar 2023 • Ye Wang, Bowei Jiang, Changqing Zou, Rui Ma
Existing cross-modal contrastive representation learning (XM-CLR) methods such as CLIP are not fully suitable for multifold data as they only consider one positive pair and treat other pairs as negative when computing the contrastive loss.