Detection of phonological features in continuous speech using neural networks

Author: King S.   Taylor P.  

Publisher: Academic Press

ISSN: 0885-2308

Source: Computer Speech & Language, Vol.14, Iss.4, 2000-10, pp. : 333-353

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

We report work on the first component of a two-stage speech recognition architecture based onphonological features rather than phones. This paper reports experiments on three phonological feature systems: (1) the Sound Pattern of English (SPE) system which uses binary features, (2) amulti-valued (MV) feature system which uses traditional phonetic categories such as manner, place, etc., and (3)Government Phonology (GP) which uses a set of structured primes. All experiments used recurrent neural networks to perform feature detection. In these networks the input layer is a standard framewise cepstral representation, and the output layer represents the values of the features. The system effectively produces a representation of the most likely phonological features for each input frame. All experiments were carried out on the TIMIT speaker-independent database. The networks performed well in all cases, with the average accuracy for a single feature ranging from 86% and 93%. We describe these experiments in detail, and discuss the justification and potential advantages of using phonological features rather than phones for the basis of speech recognition.