Please use this identifier to cite or link to this item:
|Title:||Acoustic Features Based Automatic Segmentation of Syllables|
Sharma, R. K. (Guide)
|Keywords:||Acoustic Features, Segmentation, Syllables, Speech Recognition|
|Abstract:||Automatic speech recognition (ASR) has intrigued researchers for the past several years and as such, significant contributions have been made in this field. The recognition process has been carried out at various levels, taking into consideration different speech units such as words, syllables and phonemes. The past few decades have witnessed substantial work in the field of automatic syllabification, i.e., dividing a word into its constituent syllables, which in turn are comprised of phonemes. The pronunciation of a phoneme tends to vary depending on its location within a syllable. As such an acoustic analysis of phonemes has been carried out at a syllabic level. In order to achieve this goal, the automatic segmentation of a syllable into its constituent phonemes has been undertaken in the present work. In this work, the nasal consonants (ਮ /m/) and (ਨ /n/); and the vowels (ਅ /ə/, ਆ /a/, ਐ /æ/, ਏ /e/, ਈ /i/, ਓ /o/, ਔ /ɔ/ or ਊ /u/) in Punjabi language have been focused upon. Nasal are the only class of sounds that exhibit significant speech output from the nasal cavity as opposed to the oral cavity. Thus, it was of interest to examine how the nasal consonants may be perceived at a syllabic level. Putting into use the acoustic-phonetic approach to ASR, the acoustic features, namely, envelope variance, energy level and spectral peak frequency have been examined. It has been investigated as to what characteristics of these acoustic features are exhibited by the nasal consonants in context of the adjoining vowels. As such the focus of this study has been the syllables of the type: Nasal Consonant-Vowel (such as ਮਾ /maː/) and Vowel-Nasal Consonant (such as ਆਮ /aːm/). In order to carry out the automatic segmentation of these syllables, two approaches have been presented in this work. The first approach uses the change in the energy levels within a syllable to retrieve the point of segmentation of the syllable, while the second approach makes use of the change in the envelope variance of the syllable to perform segmentation. Further, the acoustic features have been used to train support vector machine (SVM) based classifier using LibSVM that in turn, identifies the nasal consonant part and the vowel part of the syllable. The work undertaken in this thesis has been divided into six chapters. These chapters are: Introduction; Review of Literature; Data Collection, Preprocessing and Feature Extraction; Acoustic Features based automatic segmentation of Nasal Consonant-Vowel and Vowel-Nasal Consonant syllables; Results and Discussion; and Conclusion and Future Scope. The first chapter describes ASR and its various approaches: acoustic-phonetic approach, pattern-recognition approach and artificial intelligence approach. Further, different manners of speech production and their correspondence to the vocal tract have been discussed. Since characteristics of the nasal consonants have been analyzed in context with the vowels, articulation and perception of the nasal consonants and the vowels have been discussed. The nasal consonants and the vowels of Punjabi language have been considered in the present thesis. Thus, a brief description of Punjabi phonology and Punjabi syllables has also been given in this chapter. Further, because of the significance of the spectral changes perceived in the nasal-vowel and vowel-nasal articulation, spectrograms of spoken syllables have been presented. A brief introduction of the SVMs has also been given in this chapter. In this work, SVM based classifiers have been trained using LibSVM in order to identify the nasal consonant part and the vowel part of a syllable. The second chapter presents a survey of the research undertaken in the field of ASR. The review has been organized into three parts: the use of feature extraction techniques for automatic speech recognition, the research carried out for automatic segmentation of continuous speech; and the analysis of the nasals, semivowels and vowels. In the third chapter, the various phases such as data collection, preprocessing and feature extraction have been discussed. In this study, collected data comprises of syllables consisting of Punjabi nasal consonants and vowels. The syllables collected are of two types: Nasal Consonant-Vowel and Vowel-Nasal Consonant. Since the recording has been done in noise-free environment, the preprocessing phase includes windowing and framing only. Further, the three acoustic features, namely, envelope variance, energy level and spectral peak frequency have been explained in this chapter. The fourth chapter elaborates the characteristics of the acoustic features introduced in chapter three. These features have been observed for the nasal consonants and the vowels considering them at a syllabic level. Further, the two approaches to automatic segmentation of syllables: using energy differences and using the envelope variance differences, have been described. The algorithms developed for these approaches and the corresponding flowcharts have also been documented in this chapter. In the fifth chapter, the results of the experiments that were carried out for automatic segmentation of Nasal Consonant-Vowel and Vowel-Nasal Consonant syllables have been presented. The characteristics of the acoustical features, namely, envelope variance, energy level and spectral peak frequency have been depicted graphically for the two syllable forms focused in this work. Further, recognition accuracy achieved in the experiments undertaken using LibSVM, for the two approaches, has been presented. Finally, a comparison of the two approaches to automatic segmentation of syllables has been documented. It has been observed that the automatic segmentation of the syllables (Nasal Consonant-Vowel and the Vowel-Nasal Consonant) gives better results when carried out using the envelope variance differences as compared to the approach that uses energy differences to perform the segmentation. In the sixth chapter, conclusion, limitations and future scope of the present work have been presented in order to facilitate a refinement of this work.|
|Appears in Collections:||Masters Theses@SOM|
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.