no code implementations • 17 May 2024 • Tao Wu, Shuqiu Ge, Jie Qin, Gangshan Wu, LiMin Wang
Open-vocabulary spatio-temporal action detection (OV-STAD) requires training a model on a limited set of base classes with box and label supervision, which is expected to yield good generalization performance on novel action classes.