Browsing by Author "Chandra E"

Now showing 1 - 8 of 8

AUTOMATIC SPEECH RECOGNITION: ARCHITECTURE, METHODOLOGIES, CHALLENGES - A REVIEW
(International Journal of Advanced Research in Computer Science, 2011-11) Karpagavalli S; Deepika R; Kokila P; Usha Rani K; Chandra E
For more than three decades, a great amount of research was carried out on various aspects of speech signal processing and its applications. Highly successful application of speech processing is Automatic Speech Recognition (ASR). Early attempts to ASR consisted of making deterministic models of whole words in a small vocabulary and recognizing a given speech utterance as the word whose model comes closest to it. The introduction of Hidden Morkov Models (HMMs) in the early 1980 provided much more powerful tool for speech recognition. And the recognition can be done for continuous speech using large vocabulary, in a speaker independent manner. Today many products have been developed that successfully utilize ASR for communication between human and machines. Performance of speech recognition applications deteriorates in the presence of reverberation and even low levels of ambient noise. Robustness to noise, reverberation and characteristics of the transducer is still an unsolved problem that makes the research in the area of speech recognition still very active. A detailed study on ASR carried out and presented in this paper that covers the basic model of speech recognition, applications
A HIERARCHICAL APPROACH IN TAMIL PHONEME CLASSIFICATION USING SUPPORT VECTOR MACHINE
(Indian Journal of Science and Technology, 2015-12) Karpagavalli S; Chandra E
Most of the speech recognition systems are designed based on the sub-word unit phoneme which is the basic sound unit of a language. In the proposed work, a novel hierarchical approach based phoneme classification task has been carried out to reduce time complexity and search space. Hierarchical classification of set of Tamil phonemes has been done in three levels. Phoneme boundaries of the given speech utterance are identified using Spectral Transition Measure (STM) and phonemes are separated. Mel-Frequency Cepstral Coefficients (MFCC) are extracted for each phoneme represented by 9 frames including the contextual frames of corresponding phoneme. In each hierarchical level, different number of models is built using Support Vector Machine (SVM) for classifying each phoneme group/phoneme. It is observed from the results that in hierarchical approach phoneme group recognition rate at level 1 and 2 has greatly improved compared to flat classification model. Complexity of search space is significantly reduced at level 2 and level 3 contrasts to flat phoneme classification model. Hierarchical phoneme classifier can be very well employed in phoneme recognition task which is useful in applications such as spoken term detection, out-ofvocabulary detection, named entity recognition, spoken document retrieval.
ISOLATED TAMIL DIGIT SPEECH RECOGNITION USING TEMPLATE-BASED AND HMM-BASED APPROACHES
(Springer, 2012-07) Karpagavalli S; Deepika R; Kokila P; Usha Rani K; Chandra E
For more than three decades, a great amount of research was carried out on various aspects of speech signal processing and its applications. Highly successful application of speech processing is Automatic Speech Recognition (ASR). Early attempts to ASR consisted of making deterministic models of whole words in a small vocabulary and recognizing a given speech utterance as the word whose model comes closest to it. The introduction of Hidden Markov Models (HMMs) in the early 1980 provided much more powerful tool for speech recognition. And the recognition can be done for continuous speech using large vocabulary, in a speaker independent manner. Two approaches like conventional template-based and Hidden Markov Model usually performs speaker independent isolated word recognition. In this work, speaker independent isolated Tamil digit speech recognizers are designed by employing template based and HMM based approaches. The results of the approaches are compared and observed that HMM based model performs well and the word error rate is greatly reduced.
PHONEME AND WORD BASED MODEL FOR TAMIL SPEECH RECOGNITION USING GMM-HMM
(International Conference on Advanced Computing & Communication Systems, held at Sri Eshwar College of Engineering, Coimbatore during 5-7 January 2015 and published in the conference proceedings, indexed in IEEE Xplore Digital Library., 2015-01-05) Karpagavalli S; Chandra E
Speech is the standard means of communication among people. Automatic Speech Recognition (ASR) applications facilitate the users to interact with machines through speech and perform their tasks effortlessly. Speech Recognition applications in native languages will enable illiterate and semi-illiterate people to use computer services without any/little knowledge to operate computers and to lead better life. In the proposed work, speaker independent isolated- phoneme and word recognition systems have been developed for the Indian regional language Tamil. The Hidden Markov Tool Kit (HTK) was used for developing speaker independent phoneme and word based Tamil speech recognition system. The work involves main tasks like Feature Extraction, Acoustic Model Building and Decoding. Mel-Frequency Cepstral Coefficients (MFCC) is extracted from the speech utterances and Hidden Markov Model (HMM) used to build the acoustic model. In building acoustic model, Multivariate Gaussian Mixture Model with different number of components is used to estimate the state emission probabilities and finally Viterbi Decoder employed to recognize the test speech utterances. A small vocabulary of 50 words which are collected from 10 native speakers of Tamil language was used to build and test the model. The performance of both phoneme and word based models have been analyzed and the recognition accuracy and word error rate of the models are discussed.
RECOGNITION OF TAMIL SYLLABLES USING VOWEL ONSET POINTS WITH PRODUCTION, PERCEPTION BASED FEATURES
(ICTACT Journal on Soft Computing, 2016-01) Karpagavalli S; Chandra E
Tamil Language is one of the ancient Dravidian languages spoken in south India. Most of the Indian languages are syllabic in nature and syllables are in the form of Consonant-Vowel (CV) units. In Tamil language, CV pattern occurs in the beginning, middle and end of a word. In this work, CV Units formed with Stop Consonant – Short Vowel (SCSV) were considered for classification task. The work carried out in three stages, Vowel Onset Point (VOP) detection, CV segmentation and classification. VOP is an event at which the consonant part ends and vowel part begins. VOPs are identified using linear prediction residuals which provide significant characteristics of the excitation source. To segment the CV units, fixed length spectral frames before and after VOPs are considered. In this work, production based features, Linear Predictive Cepstral Coefficients (LPCC) and perception based features, Perceptual Linear Predictive Cepstral Coefficients (PLP) and Mel Frequency Cepstral Coefficients (MFCC) are extracted which are used to build the SCSV classifier using multilayer perceptron and support vector machine. A speech corpus of 200 Tamil words uttered by 15 native speakers was used, which covers all SCSV units formed with Tamil stop consonants (/k/, /ch/, /d/, /t/, /p/) and short vowels (/a/, /i/, /u/, /e/, /o/). The classifiers are trained and tested for its performance using predictive accuracy measure. The results indicate that perception based features, MFCC and PLP provides better results than production based features, LPCC and the model built using support vector machine outperforms.
A REVIEW ON AUTOMATIC SPEECH RECOGNITION ARCHITECTURE AND APPROACHES
(International Journal of Signal Processing, Image Processing and Pattern Recognition, 2016) Karpagavalli S; Chandra E
Speech is the most natural communication mode for human beings. The task of speech recognition is to convert speech into a sequence of words by a computer program. Speech recognition applications enable people to use speech as another input mode to interact with applications with ease and effectively. Speech recognition interfaces in native language will enable the illiterate/semi-literate people to use the technology to greater extent without the knowledge of operating with computer keyboard or stylus. For more than three decades, a great amount of research was carried out on various aspects of speech recognition and its applications. Today many products have been developed that successfully utilize automatic speech recognition for communication between human and machines. Performance of speech recognition applications deteriorates in the presence of reverberation and even low levels of ambient noise. Robustness to noise, reverberation and characteristics of the transducer is still an unsolved problem that makes the research in the area of speech recognition still very active. A detailed study on automatic speech recognition is carried out and presented in this paper that covers the architecture, speech parameterization, methodologies, characteristics, issues, databases, tools and applications.
A REVIEW ON SUB-WORD UNIT MODELING IN AUTOMATIC SPEECH RECOGNITION
(IOSR Journal of VLSI and Signal Processing, 2016-12) Karpagavalli S; Chandra E
The primary issue in designing a speech recognition system is the choice of suitable modeling unit. Speech recognition systems may be based on any one of the modeling unit like, word, phoneme and syllable. The selection of sub-word unit depends on many factors such as vocabulary size, complexity of the task, language. Phoneme is the most commonly used sub-word unit in state-of-the-art speech recognition systems, which is an indivisible unit of sound of a particular language. The choice of sub-word units, and the way in which the recognizer represents words in terms of combinations of those units, is the problem of sub-word modeling. This paper explores the various sub-word unit models used in speech recognition and presents the advantages and disadvantages of each sub-word unit.
TAMIL PHONEME CLASSIFICATION USING CONTEXTUAL FEATURES AND DISCRIMINATIVE MODELS
(International Conference on Communication and Signal Processing (ICCSP’15), Adhiparasakthi Engineering College, Melmaruvathur, indexed in IEEE Xplore Digital Library, 2015, 2015) Karpagavalli S; Chandra E
The speech recognition systems may be designed based on any one of the sub-word unit phoneme, tri-phone and syllable. The phonemes are a set of base-forms for representing the unique sounds in a particular language. In supervised phoneme classification, the segmentation of phoneme, features and class label are given and the goal is to classify the phoneme. Phoneme classification and recognition can be useful in applications such as spoken document retrieval, named entity extraction, out-of-vocabulary detection, language identification, and spoken term detection. In trained speech, each phoneme occurs clearly in speech waveform. In spontaneous speech, due to co-articulation effect, influence of adjacent phonemes is present in each phoneme where left and right context frame information plays vital role in accurate phoneme classification. In the proposed work, three discriminative classifiers like Multilayer Perceptron, Naive Bayes and Support Vector Machine are used to classify 25 phonemes of Tamil language. The approximate boundaries of phoneme identified using Spectral Transition Measure (STM). After segmentation, Mel Frequency Cepstral Co-Efficient (MFCC) of 9 frames including 4 left context frames, 1 centre frame corresponding to the phoneme and 4 right context frames are extracted and used as input to classifiers. Tamil word dataset prepared to cover 25 phonemes of the language. The performance of the classifiers are analysed and results are presented.