Show and Tell Demonstration

Title: Pronunciation Clinic

Date and Location:

Thursday, April 23, 15:30 - 18:00, Location: Show and Tell Area A

Presented by

Nobuaki Minematsu, Max Takazawa, Xuebin Ma (The University of Tokyo)

Description

We present a novel demo system to use new speech technologies for foreign language pronunciation training. In this demo, English is the target language. Utterances of a participant (learner) are compared to those of a teacher automatically and the system instructs him/her which part of the pronunciation should be corrected at first to become like the teacher. In this demo, we focus on only the production of English vowels.

The procedure of the demo is as follows.

1. A user selects his/her favorite teacher in the teacher selection window. In the window, some famous Hollywood movie stars, world famous English phoneticians, and word famous individuals like Princess Diana are included. The system compares utterances of the participant and those of the selected teacher and feeds back some corrective instructions to the participant.

2. The current demo system only focuses on the English vowels (monophthongs). So, what the participant has to do is only to read 11 words out, which contain 11 English monophthongs.

3. The system stores internally the same word sets recorded from the individual teachers and compares the word utterances between the participant and the teacher. It should be noted that, if direct comparison is done between the two, the system will inevitably generate inadequate instructions. When the participant is a male student and the selected teacher is Princess Diana, the system may instruct him to generate female voices at first. Pronunciation training is not a training of impersonation and students don't have to imitate the teacher's utterances acoustically.

4. To solve this problem, we use "structural representation of the pronunciation", where age, gender, and microphone differences are effectively removed from speech acoustics. Only based on this representation, a participant's vowel system and a teacher's one is compared and a sub-system of the vowels will be detected as inadequate. Then, the information on which vowels to correct at first to become like the teacher is fed back to the participant visually. Of course, different teachers will feed back somewhat different instructions.

5. After that, a speaker who has the most similar pronunciation to the participant is searched for in our pronunciation database which contains utterances of about a thousand students. Even when a participant is an old man, the speaker of the most similar pronunciation may be a young girl because the search is done structurally and age, gender, and microphone differences are effectively ignored. The searched speaker and the participant share a very similar vowel system, meaning that they belong to the same dialect group. The speaker would be expected to evaluate the participant's pronunciation as the most intelligible.

In the internet, some web sites search for the most similar face or the most similar voice. A huge student database would enable to search for the most similar pronunciation existing somewhere on the earth. This search needs a searching technology by ignoring age, gender, and microphone differences. We can provide the technology.

6. What is technically novel in this demo is the structural representation of pronunciation, where age, gender, and microphone differences are effectively removed from speech acoustics. So, the utterances of young kids cause no trouble technically.

7. Another similar and small demo will be shown, which is "Chinese dialect-based speaker classification." About 10 words or syllables are recorded from a Chinese participant. Then, his pronunciation will be plotted visually among many Chinese speakers who speak different dialects or sub-dialects. In this demo, speaker and microphone differences are also removed and only the geometrical differences of their pronunciation structures are focused on.

8. In both demos, the outputs from the system will be printed out and handed out to the participant. Each demo will take only 5 min to receive the print-out.


©2016 Conference Management Services, Inc. -||- email: webmaster@icassp09.com -||- Last updated Thursday, February 12, 2009