Prosody Labelled Dataset for Hindi

SMP (ICON) 2021 · Esha Banerjee, Atul Kr. Ojha, Girish Jha ·

This study aims to develop an intonation labelled database for Hindi, for enhancing prosody in ASR and TTS systems, which is also helpful for building Speech to Speech Machine Translation systems. Although no single standard for prosody labelling exists in Hindi, researchers in the past have employed perceptual and statistical methods in literature to draw inferences about the behaviour of prosody patterns in Hindi. Based on such existing research and largely agreed upon intonational theories in Hindi, this study attempts to develop a manually annotated prosodic corpus of Hindi speech data, which can be used for training speech models for natural-sounding speech in the future. 500 sentences (2,550 words) for declarative and interrogative types have been labelled using Praat.

PDF Abstract