no code implementations • 10 Aug 2023 • Xinyu Lyu, Jingwei Liu, Yuyu Guo, Lianli Gao
Long-temporal human actions supervise the model to generate multiple scene graphs that conform to the global constraints and avoid the model being unable to learn the tail predicates.
no code implementations • 10 Aug 2023 • Lianli Gao, Xinyu Lyu, Yuyu Guo, Yuxuan Hu, Yuan-Fang Li, Lu Xu, Heng Tao Shen, Jingkuan Song
It integrates two components: Semantic Debiasing (SD) and Balanced Predicate Learning (BPL), for these imbalances.
no code implementations • 23 Jun 2022 • Chaofan Zheng, Xinyu Lyu, Yuyu Guo, Pengpeng Zeng, Jingkuan Song, Lianli Gao
SCM is proposed to relieve semantic deviation by ensuring the semantic consistency between the generated scene graph and the ground truth in global and local representations.
1 code implementation • CVPR 2022 • Xinyu Lyu, Lianli Gao, Yuyu Guo, Zhou Zhao, Hao Huang, Heng Tao Shen, Jingkuan Song
The performance of current Scene Graph Generation models is severely hampered by some hard-to-distinguish predicates, e. g., "woman-on/standing on/walking on-beach" or "woman-near/looking at/in front of-child".
1 code implementation • 22 Feb 2022 • Yuyu Guo, Jingkuan Song, Lianli Gao, Heng Tao Shen
Specifically, the Relational Knowledge represents the prior knowledge of relationships between entities extracted from the visual content, e. g., the visual relationships "standing in", "sitting in", and "lying in" may exist between "dog" and "yard", while the Commonsense Knowledge encodes "sense-making" knowledge like "dog can guard yard".
no code implementations • 22 Feb 2022 • Yuyu Guo, Lianli Gao, Jingkuan Song, Peng Wang, Nicu Sebe, Heng Tao Shen, Xuelong Li
Inspired by this observation, in this article, we propose a relation regularized network (R2-Net), which can predict whether there is a relationship between two objects and encode this relation into object feature refinement and better SGG.
no code implementations • 22 Feb 2022 • Yuyu Guo, Jingqiu Zhang, Lianli Gao
In TS-LSTM, a temporal pooling LSTM (TP-LSTM) is designed to incorporate both spatial and temporal information to extract long-term temporal dynamics within video sub-shots; and a stacked LSTM is introduced to generate a list of words to describe the video.
2 code implementations • 30 Sep 2021 • Yuyu Guo, Lei Bi, Dongming Wei, Liyun Chen, Zhengbin Zhu, Dagan Feng, Ruiyan Zhang, Qian Wang, Jinman Kim
In the first stage, we process the raw dense image to extract sparse landmarks to represent the target organ anatomical topology and discard the redundant information that is unnecessary for motion estimation.
1 code implementation • ICCV 2021 • Yuyu Guo, Lianli Gao, Xuanhan Wang, Yuxuan Hu, Xing Xu, Xu Lu, Heng Tao Shen, Jingkuan Song
The scene graph generation (SGG) task aims to detect visual relationship triplets, i. e., subject, predicate, object, in an image, providing a structural vision layout for scene understanding.
1 code implementation • CVPR 2020 • Yuyu Guo, Lei Bi, Euijoon Ahn, Dagan Feng, Qian Wang, Jinman Kim
SVIN introduces dual networks: first is the spatiotemporal motion network that leverages the 3D convolutional neural network (CNN) for unsupervised parametric volumetric registration to derive spatiotemporal motion field from two-image volumes; the second is the sequential volumetric interpolation network, which uses the derived motion field to interpolate image volumes, together with a new regression-based module to characterize the periodic motion cycles in functional organ structures.
no code implementations • 23 Sep 2019 • Yuyu Guo
Our experimental results on a clinical dataset of 7, 000 IVOCT images demonstrated that our method outperformed the state-of-the-art methods with a recall of 0. 92 and precision of 0. 91 for strut points detection.
no code implementations • 13 Feb 2019 • Lei Bi, Yuyu Guo, Qian Wang, Dagan Feng, Michael Fulham, Jinman Kim
Our approach leverages deep residual architectures and FCNs and learns and infers the location of the optic cup and disk in a step-wise manner with fine-grained details.
no code implementations • 8 Aug 2017 • Jingkuan Song, Yuyu Guo, Lianli Gao, Xuelong. Li, Alan Hanjalic, Heng Tao Shen
In this paper, we propose a generative approach, referred to as multi-modal stochastic RNNs networks (MS-RNN), which models the uncertainty observed in the data using latent stochastic variables.