no code implementations • 21 May 2024 • Xin-Chun Li, Wen-Shu Fan, Bowen Tao, Le Gan, De-Chuan Zhan
Two fundamental observations are: (1) a larger teacher tends to produce probability vectors that are less distinct between non-ground-truth classes; (2) teachers with different capacities are basically consistent in their cognition of relative class affinity.
no code implementations • 10 Oct 2022 • Xin-Chun Li, Wen-Shu Fan, Shaoming Song, Yinchuan Li, Bingshuai Li, Yunfeng Shao, De-Chuan Zhan
Complex teachers tend to be over-confident and traditional temperature scaling limits the efficacy of {\it class discriminability}, resulting in less discriminative wrong class probabilities.