Department of Computer Science (UG)
Permanent URI for this communityhttps://dspace.psgrkcw.com/handle/123456789/150
Browse
5 results
Search Results
Item TAMIL PHONEME CLASSIFICATION USING CONTEXTUAL FEATURES AND DISCRIMINATIVE MODELS(International Conference on Communication and Signal Processing (ICCSP’15), Adhiparasakthi Engineering College, Melmaruvathur, indexed in IEEE Xplore Digital Library, 2015, 2015) Karpagavalli S; Chandra EThe speech recognition systems may be designed based on any one of the sub-word unit phoneme, tri-phone and syllable. The phonemes are a set of base-forms for representing the unique sounds in a particular language. In supervised phoneme classification, the segmentation of phoneme, features and class label are given and the goal is to classify the phoneme. Phoneme classification and recognition can be useful in applications such as spoken document retrieval, named entity extraction, out-of-vocabulary detection, language identification, and spoken term detection. In trained speech, each phoneme occurs clearly in speech waveform. In spontaneous speech, due to co-articulation effect, influence of adjacent phonemes is present in each phoneme where left and right context frame information plays vital role in accurate phoneme classification. In the proposed work, three discriminative classifiers like Multilayer Perceptron, Naive Bayes and Support Vector Machine are used to classify 25 phonemes of Tamil language. The approximate boundaries of phoneme identified using Spectral Transition Measure (STM). After segmentation, Mel Frequency Cepstral Co-Efficient (MFCC) of 9 frames including 4 left context frames, 1 centre frame corresponding to the phoneme and 4 right context frames are extracted and used as input to classifiers. Tamil word dataset prepared to cover 25 phonemes of the language. The performance of the classifiers are analysed and results are presented.Item STOP CONSONANT-SHORT VOWEL (SCSV) CLASSIFICATION FOR TAMIL SPEECH UTTERANCES(2016-02) S, Karpagavalli; E, ChandraTamil Language is one of the ancient Dravidian languages spoken in south India. Most of the Indian languages are syllabic in nature and syllables are in the form of Consonant-Vowel (CV) units. In Tamil language, CV pattern occurs in the beginning, middle and end of a word. In this work, CV units formed with Stop Consonant – Short Vowel (SCSV) were considered for classification task. The work carried out in three stages, Vowel Onset Point (VOP) detection, CV segmentation and classification. VOP is an event at which the consonant part ends and vowel part begins. VOPs are identified using linear Prediction residuals which provide significant characteristics of the excitation source. To segment the CV units, fixed length spectral frames before and after VOPs are considered. Both production based features - Linear Predictive Cepstral Coefficients (LPCC) and perception based features - Mel Frequency Cepstral Coefficients (MFCC) are extracted and given as input to the classifiers designed with multilayer perceptron and support vector machine. A speech corpus of 200 Tamil words uttered by 15 native speakers was used, which covers all SCSV units formed with Tamil stop consonants (/k/,/ch/,/d/,/t/,/p/) and short vowels (/a/,/i/, /u/, /e/, /o/). The classifiers are trained and tested for its performance using various measures. The results indicate that the model built with MFCC using support vector machine RBF kernel outperforms.Item PREDICTION OF LUNG DISEASE USING HOG FEATURES AND MACHINE LEARNING ALGORITHMS(Innovative Research in Computer and Communication Engineering, 2016-01) Pradeeba R; Karpagavalli SLung diseases are the one that mostly affects large number of people in the world. A sharp rise in respiratory disease in India due to infection, smoking and air pollution in the country. Respiratory diseases were no longer restricted to the elderly but were now being detected even in younger age groups. The early and correct diagnosis of any pulmonary disease is mandatory for timely treatment and prevent mortality. From a clinical standpoint, medical diagnosis tools and systems are of great importance. The proposed work is aimed at establishing more advanced diagnostic strategy for lung diseases using CT scan images. The three types of lung disease Emphysema, Pneumonia, Bronchitis are considered in this work. A dataset with 126 CT scan images of Emphysema, 120 CT scan images of Pneumonia and 120 CT scan images of Bronchitis are collected from National Biomedical Imaging Archive (NBIA) database. The classification of lung disease using Histogram of Oriented Gradients (HOG) features is carried out using classifiers Naive Bayes (NB), Decision tree (J48), Multilayer Perceptron (MLP) and Support Vector Machine (SVM). The performance of the models is compared for its predictive accuracy and the results are presented.Item RECOGNITION OF TAMIL SYLLABLES USING VOWEL ONSET POINTS WITH PRODUCTION, PERCEPTION BASED FEATURES(ICTACT Journal on Soft Computing, 2016-01) Karpagavalli S; Chandra ETamil Language is one of the ancient Dravidian languages spoken in south India. Most of the Indian languages are syllabic in nature and syllables are in the form of Consonant-Vowel (CV) units. In Tamil language, CV pattern occurs in the beginning, middle and end of a word. In this work, CV Units formed with Stop Consonant – Short Vowel (SCSV) were considered for classification task. The work carried out in three stages, Vowel Onset Point (VOP) detection, CV segmentation and classification. VOP is an event at which the consonant part ends and vowel part begins. VOPs are identified using linear prediction residuals which provide significant characteristics of the excitation source. To segment the CV units, fixed length spectral frames before and after VOPs are considered. In this work, production based features, Linear Predictive Cepstral Coefficients (LPCC) and perception based features, Perceptual Linear Predictive Cepstral Coefficients (PLP) and Mel Frequency Cepstral Coefficients (MFCC) are extracted which are used to build the SCSV classifier using multilayer perceptron and support vector machine. A speech corpus of 200 Tamil words uttered by 15 native speakers was used, which covers all SCSV units formed with Tamil stop consonants (/k/, /ch/, /d/, /t/, /p/) and short vowels (/a/, /i/, /u/, /e/, /o/). The classifiers are trained and tested for its performance using predictive accuracy measure. The results indicate that perception based features, MFCC and PLP provides better results than production based features, LPCC and the model built using support vector machine outperforms.Item EMAIL SPAM FILTERING USING SUPERVISED MACHINE LEARNING TECHNIQUES(International Journal of Advanced Research in Computer Science, 2011-12) Christina V; Karpagavalli S; Suganya GE-mail spam, known as unsolicited bulk Email (UBE), junk mail, or unsolicited commercial email (UCE), is the practice of sending unwanted e-mail messages, frequently with commercial content, in large quantities to an indiscriminate set of recipients. Spam is prevalent on the Internet because the transaction cost of electronic communications is radically less than any alternate form of communication. There are many spam filters using different approaches to identify the incoming message as spam, ranging from white list / black list, Bayesian analysis, keyword matching, mail header analysis, postage, legislation, and content scanning etc. Even though we are still flooded with spam emails everyday. This is not because the filters are not powerful enough, it is due to the swift adoption of new techniques by the spammers and the inflexibility of spam filters to adapt the changes. In our work, we employed supervised machine learning techniques to filter the email spam messages. Widely used supervised machine learning techniques namely C 4.5 Decision tree classifier, Multilayer Perceptron, Naïve Bayes Classifier are used for learning the features of spam emails and the model is built by training with known spam emails and legitimate emails. The results of the models are discussed.