no code implementations • 5 May 2024 • Feiyang Kang, Hoang Anh Just, Yifan Sun, Himanshu Jahagirdar, Yuanzhi Zhang, Rongxing Du, Anit Kumar Sahu, Ruoxi Jia
The goal is to minimize the need for costly domain-specific data for subsequent fine-tuning while achieving desired performance levels.
no code implementations • 22 Apr 2024 • Si Chen, Feiyang Kang, Ning Yu, Ruoxi Jia
Existing approaches to fact tracing rely on assessing the similarity between each training sample and the query along a certain dimension, such as lexical similarity, gradient, or embedding space.
no code implementations • 14 Feb 2024 • Myeongseob Ko, Feiyang Kang, Weiyan Shi, Ming Jin, Zhou Yu, Ruoxi Jia
Inspired by this, we introduce a new method for estimating the influence of training data, which requires calculating gradients for specific test samples, paired with a forward pass for each training point.
no code implementations • 22 Nov 2023 • Lingjiao Chen, Bilge Acun, Newsha Ardalani, Yifan Sun, Feiyang Kang, Hanrui Lyu, Yongchan Kwon, Ruoxi Jia, Carole-Jean Wu, Matei Zaharia, James Zou
As Machine Learning (ML) systems continue to grow, the demand for relevant and comprehensive datasets becomes imperative.
1 code implementation • 28 Apr 2023 • Hoang Anh Just, Feiyang Kang, Jiachen T. Wang, Yi Zeng, Myeongseob Ko, Ming Jin, Ruoxi Jia
(1) We develop a proxy for the validation performance associated with a training set based on a non-conventional class-wise Wasserstein distance between training and validation sets.