Search Results for author: Yitian Yuan

Found 9 papers, 4 papers with code

Weakly-Supervised 3D Visual Grounding based on Visual Linguistic Alignment

no code implementations • 15 Dec 2023 • Xiaoxu Xu, Yitian Yuan, Qiudan Zhang, Wenhui Wu, Zequn Jie, Lin Ma, Xu Wang

During the inference stage, the learned text-3D correspondence will help us ground the text queries to the 3D target objects even without 2D images.

Natural Language Queries Scene Understanding +1

Paper
Add Code

A Closer Look at Debiased Temporal Sentence Grounding in Videos: Dataset, Metric, and Approach

no code implementations • 10 Mar 2022 • Xiaohan Lan, Yitian Yuan, Xin Wang, Long Chen, Zhi Wang, Lin Ma, Wenwu Zhu

New benchmarking results indicate that our proposed evaluation protocols can better monitor the research progress.

Benchmarking Sentence +1

Paper
Add Code

Controllable Video Captioning with an Exemplar Sentence

1 code implementation • 2 Dec 2021 • Yitian Yuan, Lin Ma, Jingwen Wang, Wenwu Zhu

In this paper, we investigate a novel and challenging task, namely controllable video captioning with an exemplar sentence.

Caption Generation Decoder +3

Paper
Code

Syntax Customized Video Captioning by Imitating Exemplar Sentences

1 code implementation • 2 Dec 2021 • Yitian Yuan, Lin Ma, Wenwu Zhu

Enhancing the diversity of sentences to describe video contents is an important problem arising in recent video captioning research.

Decoder Sentence +2

Paper
Code

A Survey on Temporal Sentence Grounding in Videos

no code implementations • 16 Sep 2021 • Xiaohan Lan, Yitian Yuan, Xin Wang, Zhi Wang, Wenwu Zhu

In this survey, we give a comprehensive overview for TSGV, which i) summarizes the taxonomy of existing methods, ii) provides a detailed description of the evaluation protocols(i. e., datasets and metrics) to be used in TSGV, and iii) in-depth discusses potential problems of current benchmarking designs and research directions for further investigations.

Benchmarking Sentence +2

Paper
Add Code

A Closer Look at Temporal Sentence Grounding in Videos: Dataset and Metric

no code implementations • 22 Jan 2021 • Yitian Yuan, Xiaohan Lan, Xin Wang, Long Chen, Zhi Wang, Wenwu Zhu

All the results demonstrate that the re-organized dataset splits and new metric can better monitor the progress in TSGV.

Benchmarking Sentence +1

Paper
Add Code

Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos

1 code implementation • NeurIPS 2019 • Yitian Yuan, Lin Ma, Jingwen Wang, Wei Liu, Wenwu Zhu

Temporal sentence grounding in videos aims to detect and localize one target video segment, which semantically corresponds to a given sentence.

Sentence Temporal Sentence Grounding

Paper
Code

Sentence Specified Dynamic Video Thumbnail Generation

1 code implementation • 12 Aug 2019 • Yitian Yuan, Lin Ma, Wenwu Zhu

With the tremendous growth of videos over the Internet, video thumbnails, providing video content previews, are becoming increasingly crucial to influencing users' online searching experiences.

Sentence

Paper
Code

To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression

no code implementations • 19 Apr 2018 • Yitian Yuan, Tao Mei, Wenwu Zhu

Then, a multi-modal co-attention mechanism is introduced to generate not only video attention which reflects the global video structure, but also sentence attention which highlights the crucial details for temporal localization.

regression Sentence +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.