1 code implementation • 28 Mar 2024 • Anna Kukleva, Fadime Sener, Edoardo Remelli, Bugra Tekin, Eric Sauser, Bernt Schiele, Shugao Ma
Lately, there has been growing interest in adapting vision-language models (VLMs) to image and third-person video classification due to their success in zero-shot recognition.
no code implementations • 26 Mar 2024 • Sammy Christen, Shreyas Hampali, Fadime Sener, Edoardo Remelli, Tomas Hodan, Eric Sauser, Shugao Ma, Bugra Tekin
In the grasping stage, the model only generates hand motions, whereas in the interaction phase both hand and object poses are synthesized.
1 code implementation • 14 Mar 2024 • Md Salman Shamil, Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao
3D hand poses are an under-explored modality for action recognition.
Ranked #1 on 3D Action Recognition on Assembly101
1 code implementation • NeurIPS 2023 • Dibyadip Chatterjee, Fadime Sener, Shugao Ma, Angela Yao
Given a set of verbs and objects observed during training, the goal is to generalize the verbs to an open vocabulary of actions with seen and novel objects.
Ranked #1 on Open Vocabulary Action Recognition on Assembly101 (using extra training data)
no code implementations • 31 Jul 2023 • Guodong Ding, Fadime Sener, Shugao Ma, Angela Yao
Our framework constructs a knowledge base with spatial and temporal beliefs based on observed mistakes.
no code implementations • CVPR 2023 • Takehiko Ohkawa, Kun He, Fadime Sener, Tomas Hodan, Luan Tran, Cem Keskin
To obtain high-quality 3D hand pose annotations for the egocentric images, we develop an efficient pipeline, where we use an initial set of manual annotations to train a model to automatically annotate a much larger dataset.
2 code implementations • 19 Oct 2022 • Guodong Ding, Fadime Sener, Angela Yao
Temporal action segmentation (TAS) in videos aims at densely identifying video frames in minutes-long videos with multiple action classes.
1 code implementation • CVPR 2022 • Fadime Sener, Dibyadip Chatterjee, Daniel Shelepov, Kun He, Dipika Singhania, Robert Wang, Angela Yao
Assembly101 is a new procedural activity dataset featuring 4321 videos of people assembling and disassembling 101 "take-apart" toy vehicles.
no code implementations • 6 Jun 2021 • Abhinav Rai, Fadime Sener, Angela Yao
Modeling the visual changes that an action brings to a scene is critical for video understanding.
no code implementations • 6 Jun 2021 • Fadime Sener, Rishabh Saraf, Angela Yao
Can we teach a robot to recognize and make predictions for activities that it has never seen before?
1 code implementation • 6 Jun 2021 • Fadime Sener, Dibyadip Chatterjee, Angela Yao
At what temporal scale should they be derived?
Ranked #5 on Action Anticipation on EPIC-KITCHENS-100 (test)
2 code implementations • ECCV 2020 • Fadime Sener, Dipika Singhania, Angela Yao
Future prediction, especially in long-range videos, requires reasoning from current and past observations.
Ranked #2 on Action Anticipation on Assembly101
2 code implementations • CVPR 2019 • Anna Kukleva, Hilde Kuehne, Fadime Sener, Juergen Gall
The task of temporally detecting and segmenting actions in untrimmed videos has seen an increased attention recently.
no code implementations • 9 Dec 2018 • Divyansh Aggarwal, Elchin Valiyev, Fadime Sener, Angela Yao
When judging style, a key question that often arises is whether or not a pair of objects are compatible with each other.
no code implementations • ICCV 2019 • Fadime Sener, Angela Yao
How can we teach a robot to predict what will happen next for an activity it has never seen before?
no code implementations • CVPR 2018 • Fadime Sener, Angela Yao
This paper presents a new method for unsupervised segmentation of complex activities from video into multiple steps, or sub-activities, without any textual input.
no code implementations • 10 Apr 2017 • Samet Hicsonmez, Nermin Samet, Fadime Sener, Pinar Duygulu
The style was noticeable in other characters of the same illustrator in different books as well.