How Much Context Span is Enough? Examining Context-Related Issues for Document-level MT

LREC 2022  ·  Sheila Castilho ·

This paper analyses how much context span is necessary to solve different context-related issues, namely, reference, ellipsis, gender, number, lexical ambiguity, and terminology when translating from English into Portuguese. We use the DELA corpus, which consists of 60 documents and six different domains (subtitles, literary, news, reviews, medical, and legislation). We find that the shortest context span to disambiguate issues can appear in different positions in the document including preceding, following, global, world knowledge. Moreover, the average length depends on the issue types as well as the domain. Moreover, we show that the standard approach of relying on only two preceding sentences as context might not be enough depending on the domain and issue types.

PDF Abstract
No code implementations yet. Submit your code now

Datasets


  Add Datasets introduced or used in this paper

Results from the Paper


  Submit results from this paper to get state-of-the-art GitHub badges and help the community compare results to other papers.

Methods


No methods listed for this paper. Add relevant methods here