Show and Tell Demonstration
Title: Easy Prosody - a GUI for Modifying Text-to-Speech Prosody
Date and Location:
Thursday, April 23, 15:30 - 18:00, Location: Show and Tell Area A
Presented by
Yao Qian, Chang Guo, Frank Soong
Description
Synthesized voice has become more commonly used in many applications where spoken responses are needed, e.g. phone-based information query, reservation and ordering via a speech user interface, speech playback of email/SMS and spoken appointment reminder. The quality of synthesized speech generated from a state-of-the-art Text-to-Speech (TTS) system can meet the quality requirement of such applications in general. Occasionally, a sentence with an inappropriate prosody may still be generated. A user friendly interface for easy modification of such prosody is highly desirable. In other applications like Computer-Assisted Language Learning (CALL), synthesizing a stylized prosody demonstrated by the teacher but in student’s own voice can be very useful.
This demo will show a speech analysis, modification and synthesis system where a graphic user interface can display the different attributes of synthesized speech, including: waveform, spectrogram, loudness, pitch, along with the segmentation boundaries of speech units (phone, syllable, words, etc.) The prosody related parameters like duration, pitch, loudness can be conveniently modified with a pen/mouse by the user and corresponding speech is re-synthesized and played back. The modification can be done at different unit levels like phoneme, syllable, word, phrase or sentence.
The applications of this technology are:
- Acquisition and learning of natural prosody to improve TTS quality;
- Computer aided language learning (CALL);
- Speech synthesis with transplanted or manually specified prosody;
- Singing voice synthesis;
- Speech perception study.