no code implementations • ACL 2018 • Arun Chaganty, Stephen Mussmann, Percy Liang
For evaluating generation systems, automatic metrics such as BLEU cost nothing to run but have been shown to correlate poorly with human judgment, leading to systematic bias against certain model improvements.
no code implementations • EMNLP 2017 • Arun Chaganty, Ashwin Paranjape, Percy Liang, Christopher D. Manning
Our first contribution is a new importance-sampling based evaluation which corrects for this bias by annotating a new system{'}s predictions on-demand via crowdsourcing.
2 code implementations • NeurIPS 2015 • Keenon Werling, Arun Chaganty, Percy Liang, Chris Manning
Our goal is to deploy a high-accuracy system starting with zero training examples.