Search Results for author: Jinglin Yang

Found 1 papers, 1 papers with code

Noise-Robust De-Duplication at Scale

1 code implementation9 Oct 2022 Emily Silcock, Luca D'Amico-Wong, Jinglin Yang, Melissa Dell

Identifying near duplicates within large, noisy text corpora has a myriad of applications that range from de-duplicating training datasets, reducing privacy risk, and evaluating test set leakage, to identifying reproduced news articles and literature within large corpora.

Cannot find the paper you are looking for? You can Submit a new open access paper.