WiNER: A Wikipedia Annotated Corpus for Named Entity Recognition

IJCNLP 2017  ·  Abbas Ghaddar, Phillippe Langlais ·

We revisit the idea of mining Wikipedia in order to generate named-entity annotations. We propose a new methodology that we applied to English Wikipedia to build WiNER, a large, high quality, annotated corpus. We evaluate its usefulness on 6 NER tasks, comparing 4 popular state-of-the art approaches. We show that LSTM-CRF is the approach that benefits the most from our corpus. We report impressive gains with this model when using a small portion of WiNER on top of the CONLL training material. Last, we propose a simple but efficient method for exploiting the full range of WiNER, leading to further improvements.

PDF Abstract IJCNLP 2017 PDF IJCNLP 2017 Abstract

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here