no code implementations • 24 Aug 2023 • Songchun Zhang, Chunhui Zhao
Further, the GKSA module is used to efficiently summarize and propagate the cross-video representative action knowledge in a learnable manner to promote holistic action patterns understanding, which in turn allows the generation of high-confidence pseudo-labels for self-learning, thus alleviating ambiguity in temporal localization.
no code implementations • 14 Jan 2023 • Songchun Zhang, Chunhui Zhao
In this paper, we propose a novel Dyna-Depthformer framework, which predicts scene depth and 3D motion field jointly and aggregates multi-frame information with transformer.