Estimating the Quality of Synthesized and Natural Speech Transmitted Through Telephone Networks Using Single-ended Prediction Models

Author： Möller Sebastian Kim Doh-Suk Malfait Ludovic

Publisher： S. Hirzel Verlag

ISSN： 1610-1928

Source： Acta Acustica united with Acustica, Vol.94, Iss.1, 2008-01, pp. : 21-31

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

This paper reports on experiments to estimate the speech output quality of telephone services in an instrumental way, using single-ended quality prediction models. It addresses both naturally-produced as well as synthesized speech generated with a Text-To-Speech (TTS) system. Three auditory tests have been carried out where typical speech samples have been transmitted over various telephone channels, and then judged by listeners with respect to their overall quality. The mean auditory ratings obtained in these tests have been compared to estimates provided by three different single-ended models, one of which is currently recommended by the International Telecommunication Union for predicting the quality of naturally-produced speech. Correlations between auditory and estimated quality scores vary considerably between experiments. It is concluded that the single-ended models mainly predict the effects of the transmission channel, but not of the (naturally-produced or synthesized) source speech material.