1 code implementation • 1 Dec 2023 • Chinmay Savadikar, Xi Song, Tianfu Wu
We show the output of the first linear layer (i. e., $\omega\cdot \phi$) is surprisingly interpretable, which can play the role of a token-clustering head as a by-product to localize meaningful objects/parts in images for computer vision tasks.
no code implementations • 14 Mar 2023 • Chinmay Savadikar, Michelle Dai, Tianfu Wu
To our knowledge, it is the first attempt of lifelong learning with ViTs on the challenging VDD benchmark.