此页面上的内容需要较新版本的 Adobe Flash Player。

获取 Adobe Flash Player

Glide landmark detection using band-limited energy ratio contours

 Soojin Park,  Jeungyoon Choi, Honggoo Kang

 
(Dept. of Electrical and Electronic Engineering, Yonsei University, Seoul 120-749, Korea)
 
Abstract:A detection system for American English glides /w y r l/ in a knowledge-based automatic speech recognition system is presented. The method uses detection of dips in band-limited energy to total energy ratios, instead of detecting dips along the unmodified band-limited energy contours. By using band-limited energy ratio, the dip detection is applicable in not only intervocalic regions but also in non-intervocalic regions. A Gaussian mixture model(GMM) based classifier is then used to separate the detected vowels and nasals. This approach is tested using the TIMIT corpus and results in an overall detection rate of 69.5%, which is a 4.7% absolute increase in detection rate compared with an hidden Markov model (HMM) based phone recognizer.
 
Key words:landmarks; glide detection; knowledge-based speech recognition
 
CLD number: TN912.34 Document code: A
 
Article ID: 1674-8042(2012)04-0352-05   doi: 10.3969/j.issn.1674-8042.2012.04.011
 
 
References
 
[1] Jenkins J J, Strange W, Edman T R. Identification of vowels in vowelless syllables. Perception & Psychophysics, 1983, 34(5): 441-450.
[2] Furui S. On the role of spectral transition for speech perception. Jounal of Acoustic Society of America, 1986, 80(4): 1016-1025.
[3] Stevens K N. Evidence for the role of acoustic boundaries in the perception of speech sounds. Journal of Acoustical Society of America, 1981, 69(s1): s116.
[4] Stevens K N. Toward a model for lexical access based on acoustic landmarks and distinctive features. Journal of Acoustical Society of America, 2002, 111(4): 1872-1891.
[5] LIU S A. Landmark detection for distinctive feature-based speech recognition. Journal of Acoustical Society of America, 1996, 100(5): 3417-3430.
[6] Chomsky N, Halle M. The sound pattern of English, Cambridge MA: MIT Press, 1968.
[7] Espy-Wilson C Y. An acoustic-phonetic approach to speech recognition: Application to the semivowels. Massachusetts Institute of Technology, Ph.D. thesis, 1987.
[8] Espy-Wilson C Y. A feature-based semivowel recognition system. Journal of Acoustical Society of America, 1994, 96(1): 65-72.
[9] Mermelstein P. Automatic segmentation of speech inty syllabic units. Journal of Acoustical Society of America, 1975,58(4): 880-883.
[10] Sun W. Analysis and interpretation of glide characteristics in pursuit of an algorithm for recognition, Massachusetts Institute of Technology, MS Thesis, 1997.
[11] Garofalo J S, Lamel L F, Fisher W M, et al. The DARPA TIMIT acoustic-phonetic continuous speech corpus CDROM. Linguistic Data Consortium, 1993.
[12] Hankamer J, Aissen J. The sonority hierarchy. Papers from the parasession on Natural Phonology, Chicago: Chicago Linguistic Society, 1974.
[13] Howitt A W. Vowel landmark detection.  Proc. of Interantional Conference on Speech and Language Processing, 2000.
[14] Stevens K N. Acoustic Phonetics, Cambridge MA: MIT Press, 1998.
[15] Espy-Wilson C Y, Pruthi T, Juenja A, et al. Landmark-based approach to speech recognition: An alternative to HMMs. Proc. of Interspeech, Antwerp, Belgium, 2007: 886-889.
[16] Hanson H M. Glottal characteristics of female speakers:acoustic correlates. Journal of Acoustical Society of America, 1997,101(1): 466-481.
[17] Iseli M, Shue Y, Alwan A. Age, sex, and vowel dependencies of acoustic measures related to the voice source. Journal of Acoustical Society of America, 2007, 121(4): 2283-2295.
[18] Shue Y L, Keating P A, Vicenic C. VoiceSauce: a program for voice analysis. Journal of Acoustical Society of Amercial, 2009, 126: 2221.
[19] Young S, Evermann G, Gales M, et al. HTK documentaion. [2012-03-15].   http://htk.eng.cam.ac.uk/.