no code implementations • 20 Oct 2023 • Logan Frank, Jim Davis
Knowledge distillation (KD) has been a popular and effective method for model compression.
2 code implementations • 26 Oct 2021 • Jim Davis, Logan Frank
Standard initialization of each BN in a network sets the affine transformation scale and shift to 1 and 0, respectively.