A preliminary study of voice quality transformation based on modifications to the neutral vocal tract area function

Author: Story B.H.   Titze I.R.  

Publisher: Academic Press

ISSN: 0095-4470

Source: Journal of Phonetics, Vol.30, Iss.3, 2002-07, pp. : 485-509

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Previous Menu Next

Abstract

The idea is pursued that voice quality can be partially represented by the underlying shape of a speaker's neutral vocal tract. Using an area function model, which allows direct access to the neutral tract shape, four separate modifications were made to one male speaker's vocal tract. The modifications involve imposing constrictive or expansive effects on the pharyngeal and oral portions of the neutral area function as well as on lip aperture and the epi-laryngeal tube. A single word utterance was first synthesized by superimposing deformation patterns appropriate for the word onto the original neutral tract shape (area function). Then, four additional samples of the word were synthesized using different modified neutral area function each time. The modifications were assessed by comparing F 1–F 2 formant trajectories of the original utterance with those of the modifications. The formant frequencies were observed to shift within the F1–F 2 plane in directions predictable from simple tube acoustics. However, the modified voice qualities did not preserve the shape of the original F 1–F 2 trajectory. In other words, the modifications did not create a simple linear transformation of formant frequencies even though the “articulatory dynamics” (deformation patterns of the area function) were identical in all cases. These somewhat artificial vocal tract modifications were also compared with formant frequencies extracted from recordings of a speaker attempting to produce the same types of modifications. In general, the speaker's formant trajectories showed some similarities to the synthesized versions. However, the speaker also seemed to grade the “level” of the voice quality that was exerted on the utterance depending on whether the demands of the voice quality were in competition with the linguistic demands of a given phonetic segment. Finally, to demonstrate this type of voice quality modification in a broader context, the same procedures were applied to sentence-level speech and results were again shown as F 1–F 2 formant trajectories.