A Novel Discourse Parser Based on Support Vector Machine Classification

2 Aug 2009  ·  David duVerle, Helmut Prendinger ·

This paper introduces a new algorithm to parse discourse within the framework of Rhetorical Structure Theory (RST). Our method is based on recent advances in the field of statistical machine learning (multivariate capabilities of Support Vector Machines) and a rich feature space. RST offers a formal framework for hierarchical text organization with strong applications in discourse analysis and text generation. We demonstrate automated annotation of a text with RST hierarchically organised relations, with results comparable to those achieved by specially trained human annotators. Using a rich set of shallow lexical, syntactic and structural features from the input text, our parser achieves, in linear time, 73.9% of professional annotators’ human agreement F-score. The parser is 5% to 12% more accurate than current state-of-the-art parsers.

PDF Abstract

Datasets


Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Discourse Parsing RST-DT HILDA Parser RST-Parseval (Span) 83.0 # 9
RST-Parseval (Nuclearity) 68.4 # 9
RST-Parseval (Relation) 55.3 # 9
RST-Parseval (Full) 54.8 # 4

Methods


No methods listed for this paper. Add relevant methods here