Show and Tell Demonstration

Title: An Instrumental Speech Enhancement System Quality Assessment Option in the New ITU-T P.1100 Recommendation: A Tool Presentation

Date and Location:

Wednesday, April 22, 15:00 - 17:30, Location: Show and Tell Area A

Presented by

Tim Fingscheidt, Suhadi Suhadi

Description

Quality assessment of speech enhancement systems is generally conducted via subjective listening tests or objective measurements. The most important parameters are speech distortion, residual noise level, and residual echo level. Usually, the performance is evaluated directly on the enhanced signal, which however makes it difficult to judge these parameters independently and precisely, especially during double talk.

As a solution, it is reasonable to split the resulting enhanced signal into its signal components. For this purpose, all additive components of the microphone signal (i.e., clean speech, noise and echo signals) are separately processed through the sending direction (i.e., uplink) of the enhancement system, whereby the enhancement system's behaviour has to be logged beforehand based on noisy and echoic input speech. There are a couple of assumptions taken with such an approach, the most problematic may be that internal processing of the speech enhancement system must be known beforehand (“white box” test).

The new ITU-T P.1100 Recommendation "Narrowband Hands Free Communication in Motor Vehicles" offers an interesting instrumental test option that can be applied to unkown ("black box") systems such as hardware or compiled software realizations. It is also attractive for research on speech enhancement systems since some intuitive white box system assumptions and restrictions do not apply. Based on some prior work on signal separation we developed a MATLAB-GUI-based tool that allows to experience the following at the demo booth:

We can record live background noise (conference noise), or choose some other pre-recorded noise type.
Then we will record clean (close-talk) speech.
Then a handsfree system under test is started with some far-end signal, the just recorded speech and noise are added to a simulated echo signal, the important uplink processing is performed yielding the enhanced speech signal.
The most exciting part will be the separation of the enhanced signal into its 3 components: namely distorted speech, attenuated and distorted noise, and attenuated and distorted echo. Maybe for the first time you will listen to such signal components while they are taken from a pure double-talk test case. In addition we will show how instrumental measures such as PESQ MOS can be used in a sound manner to judge the amount of speech distortion.