Search Results for author: Fengyun Rao

Found 12 papers, 4 papers with code

Visual Perception by Large Language Model's Weights

no code implementations • 30 May 2024 • Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun

Following this paradigm, we propose VLoRA with the perceptual weights generator.

Paper
Add Code

Multi-Modal Generative Embedding Model

no code implementations • 29 May 2024 • Feipeng Ma, Hongwei Xue, Guangting Wang, Yizhou Zhou, Fengyun Rao, Shilin Yan, Yueyi Zhang, Siying Wu, Mike Zheng Shou, Xiaoyan Sun

Existing models usually tackle these two types of problems by decoupling language modules into a text decoder for generation, and a text encoder for embedding.

Paper
Add Code

ReGenNet: Towards Human Action-Reaction Synthesis

no code implementations • 18 Mar 2024 • Liang Xu, Yizhou Zhou, Yichao Yan, Xin Jin, Wenhan Zhu, Fengyun Rao, Xiaokang Yang, Wenjun Zeng

Humans constantly interact with their surrounding environments.

Decoder

Paper
Add Code

Spatial-Semantic Collaborative Cropping for User Generated Content

1 code implementation • 16 Jan 2024 • Yukun Su, Yiwen Cao, Jingliang Deng, Fengyun Rao, Qingyao Wu

A large amount of User Generated Content (UGC) is uploaded to the Internet daily and displayed to people world-widely through the client side (e. g., mobile and PC).

Image Cropping

Paper
Code

Inter-X: Towards Versatile Human-Human Interaction Analysis

no code implementations • 26 Dec 2023 • Liang Xu, Xintao Lv, Yichao Yan, Xin Jin, Shuwen Wu, Congsheng Xu, Yifan Liu, Yizhou Zhou, Fengyun Rao, Xingdong Sheng, Yunhui Liu, Wenjun Zeng, Xiaokang Yang

We also equip Inter-X with versatile annotations of more than 34K fine-grained human part-level textual descriptions, semantic interaction categories, interaction order, and the relationship and personality of the subjects.

Paper
Add Code

Image Captioning with Multi-Context Synthetic Data

no code implementations • 29 May 2023 • Feipeng Ma, Yizhou Zhou, Fengyun Rao, Yueyi Zhang, Xiaoyan Sun

This potential can be harnessed to create synthetic image-text pairs for training captioning models.

Image Captioning Language Modelling +2

Paper
Add Code

A Similarity Alignment Model for Video Copy Segment Matching

1 code implementation • 25 May 2023 • Zhenhua Liu, Feipeng Ma, Tianyi Wang, Fengyun Rao

We propose a Similarity Alignment Model(SAM) for video copy segment matching.

Copy Detection Partial Video Copy Detection +1

Paper
Code

A Dual-level Detection Method for Video Copy Detection

1 code implementation • 21 May 2023 • Tianyi Wang, Feipeng Ma, Zhenhua Liu, Fengyun Rao

With the development of multimedia technology, Video Copy Detection has been a crucial problem for social media platforms.

Copy Detection Partial Video Copy Detection +2

Paper
Code

Tencent-MVSE: A Large-Scale Benchmark Dataset for Multi-Modal Video Similarity Evaluation

no code implementations • CVPR 2022 • Zhaoyang Zeng, Yongsheng Luo, Zhenhua Liu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen

In this paper, we propose the Tencent-MVSE dataset, which is the first benchmark dataset for the multi-modal video similarity evaluation task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

CA-SSL: Class-Agnostic Semi-Supervised Learning for Detection and Segmentation

1 code implementation • 9 Dec 2021 • Lu Qi, Jason Kuen, Zhe Lin, Jiuxiang Gu, Fengyun Rao, Dian Li, Weidong Guo, Zhen Wen, Ming-Hsuan Yang, Jiaya Jia

To improve instance-level detection/segmentation performance, existing self-supervised and semi-supervised methods extract either task-unrelated or task-specific training signals from unlabeled data.

object-detection Object Detection +2

674

Paper
Code

CLIP4Caption: CLIP for Video Caption

no code implementations • 13 Oct 2021 • Mingkang Tang, Zhanyu Wang, Zhenhua Liu, Fengyun Rao, Dian Li, Xiu Li

It is noted that our model is only trained on the MSR-VTT dataset.

Decoder Sentence +4

Paper
Add Code

CLIP4Caption ++: Multi-CLIP for Video Caption

no code implementations • 11 Oct 2021 • Mingkang Tang, Zhanyu Wang, Zhaoyang Zeng, Fengyun Rao, Dian Li

We make the following improvements on the proposed CLIP4Caption++: We employ an advanced encoder-decoder model architecture X-Transformer as our main framework and make the following improvements: 1) we utilize three strong pre-trained CLIP models to extract the text-related appearance visual features.

Decoder Sentence

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.