此页面上的内容需要较新版本的 Adobe Flash Player。

获取 Adobe Flash Player

ARMA Modelling for Whispered Speech

Xueli LI(栗学丽), Weidong ZHOU(周卫东)


School of Information Science and Engineering, Shandong Universit y, Jinan 250100, China

 

Abstract-The Autoregressive Moving Average (ARMA) model for wh ispered speech is proposed. Compared with normal speech, whispered speech has no  fundamental frequency because of the glottis being semiopened and turbulent f low being created, and formant shifting exists in the lower frequency region due  to the narrowing of the tract in the false vocal fold regions and weak acoustic  coupling with the subglottal system. Analysis shows that the effect of the subg lottal system is to introduce additional polezero pairs into the vocal tract t ransfer function. Theoretically, the method based on an ARMA process is superior  to that based on an AR process in the spectral analysis of the whispered speech . Two methods, the least squared modified YuleWalker likelihood estimate (LSMY ) algorithm and the FrequencyDomain SteiglitzMcbride (FDSM) algorithm, are a pplied to the ARMA model for the whispered speech. The performance evaluation sh ows that the ARMA model is much more appropriate for representing the whispered  speech than the AR model, and the FDSM algorithm provides a more accurate estima tion of the whispered speech spectral envelope than the LSMY algorithm with high er computational complexity.

 

Key words-ARMA model; AR model; whispered speech; LSMY  algorithm; FDSM algorithm

 

Manuscript Number: 1674-8042(2010)03-0300-04

 

dio: 10.3969/j.issn.1674-8042.2010.03.22

 

References

 

[1] Xueli Li, Boling Xu, 2005. Formant comparison between whispered a nd voiced vowels in Mandarin. Acta Acustica united with Acustica, 91(6): 1079-1085.

[2] H. Morikawa, H. Fujisaki, 1982. Adaptive analysis of speech based on  a polezero representation. IEEE transactions on acoustic speech and signal p rocessing, ASSP30(1): 77-88.

[3] K. J. Kallail, F. W. Emanuel, 1984. Formantfrequency differences be tween isolated whispered and phonated of vowel samples produced by adult female  subjects. Journal of Speech and Hearing Research, 27(2): 245-251.

[4] M. Matsuda, H. Kasuya, 1999. Acoustic Nature of the Whisper. Proc. EU ROSPEECH, p. 137-140.

[5] W. Ding, H. Kasuya, S. Adachi, 1995. Simultaneous estimation of vocal  tract and voice source parameters based on an ARX model. IEICE Trans. Inf. Sy st., E78D,(6): 738-743.

[6] S. M. Kay, 1988. Modern spectral estimation: theory and application.  PrenticeHall, Englewood Cliffs, NJ.

[7] D.G. Childers, 2000. Speech processing and synthesis toolboxes. John  Wiley & Sons, Inc., New York.

[8] Leland B. Jackson, 2008. Frequencydomain SteiglitzMcBride method  for leastsquares IIR filter design, ARMA modeling, and periodogram smoothing.  IEEE Signal Processing Letters, 15: 49-52.

[9] S. Yim. D. Sen, W. H. Holmes, 1994. Comparison of ARMA modelling met hods for low bit rate speech coding. Proc. ICASSP, p. 273-276.

[10] Kechu Yi, Bin Tian, Qiang Fu, 2000. Speech Signal Processing. Nation al Defence Industry Press, China, p. 48.

[11] Xueli Li, Hui Ding, Boling Xu, 2005. Entropybased initial/fina l segmentation for Chinese whispered speech. ACTA ACUSTICA, 30(1):69-75.
 

[full text view]