1 code implementation • 14 Mar 2022 • Yun He, Xue Feng, Cheng Cheng, Geng Ji, Yunsong Guo, James Caverlee
Specifically, in each training iteration and adaptively for each part of the network, the gradient of an auxiliary loss is carefully reduced or enlarged to have a closer magnitude to the gradient of the target loss, preventing auxiliary tasks from being so strong that dominate the target task or too weak to help the target task.