Xueli LI(栗学丽), Weidong ZHOU(周卫东)
School of Information Science and Engineering, Shandong Universit y, Jinan 250100, China
Abstract-The Autoregressive Moving Average (ARMA) model for wh ispered speech is proposed. Compared with normal speech, whispered speech has no fundamental frequency because of the glottis being semiopened and turbulent f low being created, and formant shifting exists in the lower frequency region due to the narrowing of the tract in the false vocal fold regions and weak acoustic coupling with the subglottal system. Analysis shows that the effect of the subg lottal system is to introduce additional polezero pairs into the vocal tract t ransfer function. Theoretically, the method based on an ARMA process is superior to that based on an AR process in the spectral analysis of the whispered speech . Two methods, the least squared modified YuleWalker likelihood estimate (LSMY ) algorithm and the FrequencyDomain SteiglitzMcbride (FDSM) algorithm, are a pplied to the ARMA model for the whispered speech. The performance evaluation sh ows that the ARMA model is much more appropriate for representing the whispered speech than the AR model, and the FDSM algorithm provides a more accurate estima tion of the whispered speech spectral envelope than the LSMY algorithm with high er computational complexity.
Key words-ARMA model; AR model; whispered speech; LSMY algorithm; FDSM algorithm
Manuscript Number: 1674-8042(2010)03-0300-04
dio: 10.3969/j.issn.1674-8042.2010.03.22
References
[1] Xueli Li, Boling Xu, 2005. Formant comparison between whispered a nd voiced vowels in Mandarin. Acta Acustica united with Acustica, 91(6): 1079-1085.
[2] H. Morikawa, H. Fujisaki, 1982. Adaptive analysis of speech based on a polezero representation. IEEE transactions on acoustic speech and signal p rocessing, ASSP30(1): 77-88.
[3] K. J. Kallail, F. W. Emanuel, 1984. Formantfrequency differences be tween isolated whispered and phonated of vowel samples produced by adult female subjects. Journal of Speech and Hearing Research, 27(2): 245-251.
[4] M. Matsuda, H. Kasuya, 1999. Acoustic Nature of the Whisper. Proc. EU ROSPEECH, p. 137-140.
[5] W. Ding, H. Kasuya, S. Adachi, 1995. Simultaneous estimation of vocal tract and voice source parameters based on an ARX model. IEICE Trans. Inf. Sy st., E78D,(6): 738-743.
[6] S. M. Kay, 1988. Modern spectral estimation: theory and application. PrenticeHall, Englewood Cliffs, NJ.
[7] D.G. Childers, 2000. Speech processing and synthesis toolboxes. John Wiley & Sons, Inc., New York.
[8] Leland B. Jackson, 2008. Frequencydomain SteiglitzMcBride method for leastsquares IIR filter design, ARMA modeling, and periodogram smoothing. IEEE Signal Processing Letters, 15: 49-52.
[9] S. Yim. D. Sen, W. H. Holmes, 1994. Comparison of ARMA modelling met hods for low bit rate speech coding. Proc. ICASSP, p. 273-276.
[10] Kechu Yi, Bin Tian, Qiang Fu, 2000. Speech Signal Processing. Nation al Defence Industry Press, China, p. 48.
[11] Xueli Li, Hui Ding, Boling Xu, 2005. Entropybased initial/fina l segmentation for Chinese whispered speech. ACTA ACUSTICA, 30(1):69-75.
[full text view]