WDC-Dialogue

Introduced by Zhou et al. in EVA: An Open-Domain Chinese Dialogue System with Large-Scale Generative Pre-Training

WDC-Dialogue is a dataset built from the Chinese social media to train EVA. Specifically, conversations from various sources are gathered and a rigorous data cleaning pipeline is designed to enforce the quality of WDC-Dialogue.

The dataset mainly focuses on three categories of textual interaction data, i.e., repost on social media, comment / reply on various online forums and online question and answer (Q&A) exchanges. Each round of these textual interactions yields a dialogue session via well-designed parsing rules.

Homepage