Department of Computer Science (UG)

Permanent URI for this communityhttps://dspace.psgrkcw.com/handle/123456789/150

Browse

Now showing 1 - 20 of 107

PASSWORD STRENGTH PREDICTION USING SUPERVISED MACHINE LEARNING TECHNIQUES
(International Conference on Advances in Computing, Control and Telecommunication Technologies, ACT 2009 archived in IEEE Xplore and IEEE CS Digital Library., 2009) Karpagavalli S; Jamuna K S; Vijaya M S
Passwords are a vital component of system security. Though there are many alternatives to passwords for access control, password is the more compellingly authenticating the identity in many applications. They provide a simple, direct means of protecting a system and they represent the identity of an individual for a system. The big vulnerability of passwords lies in their nature. Users are consistently told that a strong password is essential these days to protect private data as there are so many ways for an unauthorized person with little technical knowledge or skill to learn the passwords of legitimate users. Thus it is important for organizations to recognize the vulnerabilities to which passwords are subjected, and develop strong policies governing the creation and use of passwords to ensure that those vulnerabilities are not exploited. In this work password strength prediction is modeled as classification task and supervised machine learning techniques were employed. Widely used supervised machine learning techniques namely C 4.5 decision tree classifier, multilayer perceptron, naive Bayes classifier and support vector machine were used for learning the model. The results of the models were compared and observed that SVM performs well. The results of the models were also compared with the existing password strength checking tools. The findings show that machine learning approach has substantial capability to classify the extreme cases - Strong and weak passwords.
AN EFFICIENT HIERARCHICAL CLUSTERING ALGORITHM FOR PROTEIN SEQUENCING
(Government College of Technology, Coimbatore, 2009-02-22) Arunpriya C; Meera S; Balasaravanan T
Clustering is the division of data into groups of similar objects. The main objective of this unsupervised leaming technique is to find a meaningful partition by using a distance or similarity function. This paper discusses about the incremental clustering algorithm-Leaders and Sub leaders- an extension of leader algorithm, suitable for protein sequences of bioinformatics is proposed for effective clustering and prototype selection for pattern classification .It is a simple and efficient technique to generate a hierarchical structure for finding the sub clusters within each cluster. The experimental results of the proposed algorithm are compared with that of the Nearest Neighbour Classifier (NNC) methods. It is found to be computationally efficient when compared to NNC. Classification accuracy obtained using the representatives generated by Leader - Sub leader method is found to be better than that of using the Leaders method and NNC method. Even if more number of prototypes is generated classification time is less when compared to NNC methods
MACHINE LEARNING APPROACH FOR PREOPERATIVE ANAESTHETIC RISK PREDICTION
(Academy Publishers, Finland, 2009-05) Karpagavalli S; Jamuna K S; Vijaya M S
Risk is ubiquitous in medicine but anaesthesia is an unusual speciality as it routinely involves deliberately placing the patient in a situation that is intrinsically full of risk. Patient safety depends on management of those risks; consequently, anaesthetist has been at the forefront of clinical risk management. Anaesthetic risk classification is of prime importance not only in carrying out the day-to-day anaesthetic practice but coincides with surgical risks and morbidity condition. The preoperative assessment is made to identify the patients risk level based on American Society of Anesthesiologists (ASA) score that is widely used in anaesthetic practice. This helps the anaesthetist to make timely clinical decision. Machine Learning techniques can help the integration of computer-based systems in the healthcare environment providing opportunities to facilitate and enhance the work of medical experts and ultimately to improve the efficiency and quality of medical care. This paper presents the implementation of three supervised learning algorithms, C4.5 Decision tree classifier, Naive Bayes and Multilayer Perceptron in WEKA environment, on the preoperative assessment dataset. The classification models were trained using the data collected from 362 patients. The trained models were then used for predicting the anaesthetic risk of the patients. The prediction accuracy of the classifiers was evaluated using 10-fold cross validation and the results were compared.
A NOVEL APPROACH FOR PASSWORD STRENGTH ANALYSIS THROUGH SUPPORT VECTOR MACHINE
(Academy Publishers, Finland, 2009-11) Karpagavalli S; Jamuna K S; Vijaya M S
Passwords are ubiquitous authentication methods and they represent the identity of an individual for a system. Users are consistently told that a strong password is essential these days to protect private data. Despite the existence of more secure methods of authenticating users, including smart cards and biometrics, password authentication continues to be the most common means in use. Thus it is important for organizations to recognize the vulnerabilities to which passwords are subjected, and develop strong policies governing the creation and use of passwords to ensure that those vulnerabilities are not exploited. This work employs machine Learning technique to analyze the strength of the password to facilitate organizations launch a multi-faceted defense against password breach and provide a highly secure environment. A supervised learning algorithm namely Support Vector Machine is used for classification of password. The linear and nonlinear SVM classification models are trained using the features extracted from the password dataset. The trained model shows the prediction accuracy of about 98% for 10-fold cross validation
DATA WAREHOUSE AUTOMATION-A REVIEW
(CIIT JOURNALS, 2010) A S, Kavitha; R, Kavitha
Business enterprises invest lots of money to develop data warehouse that gives them real, constant and up to date data for decision making. To keep data warehouse update, traditionally, data warehouses are updated periodically. Periodic updates make a delay between operational data and warehouse data. These updates are triggered on time set; some may set it to evening time when there is no load of work on systems. This fixing of time does not work in every case. Many companies run day and night without any break, then in these situations periodic updates stale warehouse. This delay depends upon the periodic interval, as interval time increase the difference between operational and warehouse data also increase. The most recent data is unavailable for the analysis because it resides in operational data sources. For timely and effective decision making warehouse should be updated as soon as possible. Extraction, Transformation and Loading (ETL) are designed tools for the updating of warehouse. When warehouse is refreshed for the update purpose, it often gets stuck due to overloading on resources. Perfect time should be chosen for the updating of warehouse, so that utilize our resources can be utilized efficiently. Warehouse is not updated once, this is cyclic process. Here this paper is introducing automation for ETL , the proposed framework will select best time to complete the process, so that warehouse gets updated automatically as soon as resources are available without compromising on data warehouse usage.
AN INTERACTIVE TOOL FOR YARN STRENGTH PREDICTION USING SUPPORT VECTOR REGRESSION
(CPS and indexed in Thompson CSI, 2010) Selvanayaki M; Vijaya M S; Jamuna K S
Cotton, popularly known as White Gold has been an important commercial crop of National significance due to the immense influence of its rural economy. Transfer of technology to identify the quality of fibre is gaining importance. The physical characteristics of cotton such as fiber length, length distribution, trash value, color grade, strength, shape, tenacity, density, moisture absorption, dimensional stability, resistance, thermal reaction, count, etc., contributes to determine the quality of cotton and in turn yarn strength. In this paper yarn strength prediction has been modeled using regression. Support Vector regression, the supervised machine learning technique has been employed for predicting the yarn strength. The trained model was evaluated based on mean squared error and correlation coefficient and was found that the prediction accuracy of SVR based model, the intelligence reasoning method is higher compared with the traditional statistical regression, the least square regression model.
CLASSIFICATION OF SEED COTTON YIELD BASED ON THE GROWTH STAGES OF COTTON CROP USING MACHINE LEARNING TECHNIQUES
(IEEE Xplore and IEEE CS Digital Library, 2010-06) Jamuna K S; Karpagavalli S; Vijaya M S; Revathi P; Gokilavani S; Madhiya E
Cotton, popularly known as "White Gold" has been an important commercial crop of national significance due to the immense influence of its rural economy. Cotton seed is an important and critical link in the chain of agricultural activities extending farmer industry linkage. Cotton yield is associated with high quality seed as the seed contains in itself the blue print for the agrarian prosperity in incipient form. Transfer of technology to identify the quality of seeds is gaining importance. Hence this work employs machine learning approach to classify the quality of seeds based on the different growth stages of the cotton crop. Machine learning techniques - Naïve Bayes Classifier, Decision Tree Classifier and Multilayer Perceptron were applied for training the model. Features are extracted from a set of 900 records of different categories to facilitate training and implementation. The performance of the model was evaluated using 10 -fold cross validation. The results obtained show that Decision Tree Classifier and Multilayer Perceptron provides the same accuracy in classifying the seed cotton yield. The time taken to build the model is higher in Multilayer Perceptron as compared to the Decision Tree Classifier.
DATA WAREHOUSE AUTOMATION- A REVIEW
(CIIT International Journal of Data Mining and Knowledge Engineering, 2010-10) A S, Kavitha; R, Kavitha
Business enterprises invest lots of money to develop data warehouse that gives them real, constant and up to date data for decision making. To keep data warehouse update, traditionally, data warehouses are updated periodically. Periodic updates make a delay between operational data and warehouse data. These updates are triggered on time set; some may set it to evening time when there is no load of work on systems. This fixing of time does not work in every case. Many companies run day and night without any break, then in these situations periodic updates stale warehouse. This delay depends upon the periodic interval, as interval time increase the difference between operational and warehouse data also increase. The most recent data is unavailable for the analysis because it resides in operational data sources. For timely and effective decision making warehouse should be updated as soon as possible. Extraction, Transformation and Loading (ETL) are designed tools for the updating of warehouse. When warehouse is refreshed for the update purpose, it often gets stuck due to overloading on resources. Perfect time should be chosen for the updating of warehouse, so that utilize our resources can be utilized efficiently. Warehouse is not updated once, this is cyclic process. Here this paper is introducing automation for ETL, the proposed framework will select best time to complete the process, so that warehouse gets updated automatically as soon as resources are available without compromising on data warehouse usage.
PROACTIVE PASSWORD STRENGTH ANALYZER USING FILTERS AND MACHINE LEARNING TECHNIQUES
(International Journal of Computer Applications, 2010-10) Suganya G; Karpagavalli S
Passwords are ubiquitous authentication methods and they represent the identity of an individual for a system. Users are consistently told that a strong password is essential these days to protect private data. Despite the existence of more secure methods of authenticating users, including smart cards and biometrics, password authentication continues to be the most common means in use. Thus it is important for organizations to recognize the vulnerabilities to which passwords are subjected, and develop strong policies governing the creation and use of passwords to ensure that those vulnerabilities are not exploited. This work proposes a framework to analyze the strength of the password proactively. To analyze the chosen password, filters and support vector machine are employed. This framework can be implemented as a submodule of the access control scheme.
A STUDY ON EMAIL SPAM FILTERING TECHNIQUES
(International Journal of Computer Applications, 2010-12) Christina V; Karpagavalli S; Suganya G
Electronic mail is used daily by millions of people to communicate around the globe and is a mission-critical application for many businesses. Over the last decade, unsolicited bulk email has become a major problem for email users. An overwhelming amount of spam is flowing into users’ mailboxes daily. Not only is spam frustrating for most email users, it strains the IT infrastructure of organizations and costs businesses billions of dollars in lost productivity. The necessity of effective spam filters increases. In this paper, we presented our study on various problems associated with spam and spam filtering methods, techniques.
AN EFFICIENT CANCER CLASSIFICATION USING EXPRESSIONS OF VERY FEW GENES USING SUPPORT VECTOR MACHINE
(Sun College of Engineering and Technology, Nagercoil, 2011-03-24) Arunpriya C; Balasaravanan T; Antony Selvadoss Thanamani
Gene expression profiling by microarray technique has been effectively utilized for classification and diagnostic guessing of cancer nodules. Several machine learning and data mining techniques are presently applied for identifying cancer using gene expression data. Though, these techniques have not been proposed to deal with the particular needs of gene microarray examination. Initially, microarray data is featured by a high-dimensional feature space repeatedly surpassing the sample space dimensionality by a factor of 100 or higher. Additionally, microarray data contains a high degree of noise. The majority of the existing techniques do not sufficiently deal with the drawbacks like dimensionality and noise. Gene ranking method is later introduced to overcome those problems. Some of the widely used Gene ranking techniques are T-Score, ANOVA, etc. But those techniques will sometimes wrongly predict the rank when large database is used. To overcome these issues, this paper proposes a technique called Enrichment Score for ranking purpose. The classifier used in the proposed technique is Support Vector Machine (SVM). The experiment is performed on lymphoma data set and the result shows the better accuracy of classification when compared to the conventional method.
FACIAL ANIMATION TECHNIQUE
(PSGR Krishnammal College for Women, Coimbatore, 2011-10-01) Arunpriya C; Antony Selvadoss Thanamani
An unsolved problem in computer graphics is the construction and animation of realistic human facial models. Traditionally, facial models have been built painstakingly by manual digitization and animated by ad hoc parametrically controlled facial mesh deformations or kinematics approximation of muscle actions. Fortunately, animators are now able to digitize facial geometries through the use of scanning range sensors and animate them through the dynamic simulation of facial tissues and muscles. However, these techniques require considerable user input to construct facial models of individuals suitable for animation polygonal modeling specifies exactly each 3d point, which connected to each other as polygons. This is an exacting way to get topology. Patches indirectly defines a smooth curve surface from a set of control points. A small amount of control points can define a complex surface. One type of spline is called NURBS, which stands for Non Uniform Rational B-Splines. This type of batch allows each control point to have its own weight that can affect the "pinch'" of the curve at the point. So they are considered the most versatile of batches. They work very well for organic smooth objects so hence they are well suited for facial modeling.
AUTOMATIC SPEECH RECOGNITION: ARCHITECTURE, METHODOLOGIES, CHALLENGES - A REVIEW
(International Journal of Advanced Research in Computer Science, 2011-11) Karpagavalli S; Deepika R; Kokila P; Usha Rani K; Chandra E
For more than three decades, a great amount of research was carried out on various aspects of speech signal processing and its applications. Highly successful application of speech processing is Automatic Speech Recognition (ASR). Early attempts to ASR consisted of making deterministic models of whole words in a small vocabulary and recognizing a given speech utterance as the word whose model comes closest to it. The introduction of Hidden Morkov Models (HMMs) in the early 1980 provided much more powerful tool for speech recognition. And the recognition can be done for continuous speech using large vocabulary, in a speaker independent manner. Today many products have been developed that successfully utilize ASR for communication between human and machines. Performance of speech recognition applications deteriorates in the presence of reverberation and even low levels of ambient noise. Robustness to noise, reverberation and characteristics of the transducer is still an unsolved problem that makes the research in the area of speech recognition still very active. A detailed study on ASR carried out and presented in this paper that covers the basic model of speech recognition, applications
EMAIL SPAM FILTERING USING SUPERVISED MACHINE LEARNING TECHNIQUES
(International Journal of Advanced Research in Computer Science, 2011-12) Christina V; Karpagavalli S; Suganya G
E-mail spam, known as unsolicited bulk Email (UBE), junk mail, or unsolicited commercial email (UCE), is the practice of sending unwanted e-mail messages, frequently with commercial content, in large quantities to an indiscriminate set of recipients. Spam is prevalent on the Internet because the transaction cost of electronic communications is radically less than any alternate form of communication. There are many spam filters using different approaches to identify the incoming message as spam, ranging from white list / black list, Bayesian analysis, keyword matching, mail header analysis, postage, legislation, and content scanning etc. Even though we are still flooded with spam emails everyday. This is not because the filters are not powerful enough, it is due to the swift adoption of new techniques by the spammers and the inflexibility of spam filters to adapt the changes. In our work, we employed supervised machine learning techniques to filter the email spam messages. Widely used supervised machine learning techniques namely C 4.5 Decision tree classifier, Multilayer Perceptron, Naïve Bayes Classifier are used for learning the features of spam emails and the model is built by training with known spam emails and legitimate emails. The results of the models are discussed.
EMPIRICAL EVALUATION OF FEATURE SELECTION TECHNIQUE IN EDUCATIONAL DATA MINING
(ARPN Journal of Science and Technology, 2012) A S, Kavitha; J, VijiGrpisy; R, Kavitha
In machine learning the classification task is commonly referred to as supervised learning. In supervised learning there is a specified set of classes and objects are labeled with the appropriate class. The goal is to generalize from the training objects that will enable novel objects to be identified as belonging to one of the classes. Evaluating the performance of learning algorithms is a fundamental aspect of machine learning. The primary objective of this thesis is to study the classification accuracy using feature selection with machine learning algorithms. Feature selection is considered successful if the dimensionality of the data is reduced and accuracy of a learning algorithm improves or remains the same. Hence our contribution in this research is to prepare an educational dataset with real time feedback from students and try to apply the same with weka tool to measure the classification accuracy. Some part of implementation is compiled with weka, which is written in java and experiment with weka explorer.
AN EFFICIENT LEAF RECOGNITION ALGORITHM FOR PLANT CLASSIFICATION USING SUPPORT VECTOR MACHINE
(Periyar University, Salem., 2012-03-21) Arunpriya C; Balasaravanan T; Antony Selvadoss Thanamani
Recognition of plants has become an active area of research as most of the plant species are at the risk of extinction. This paper uses an efficient machine learning approach for the classification purpose. This proposed approach consists of three phases such as preprocessing, feature extraction and classification. The preprocessing phase involves a typical image processing steps such as transforming to gray scale and boundary enhancement. The feature extraction phase derives the common DMF from five fundamental features. The main contribution of this approach is the Support Vector Machine (SVM) classification for efficient leaf recognition. 12 leaf features which are extracted and orthogonalized into 5 principal variables are given as input vector to the SVM. Classifier tested with flavia dataset and a real dataset and compared with k-NN approach, the proposed approach produces very high accuracy and takes very less execution time.
A SURVEY ON SPECIES RECOGNITION SYSTEM FOR PLANT CLASSIFICATION
(International Journal of Computer Technology & Applications (IJCTA), 2012-05) Arunpriya C; Antony Selvadoss Thanamani
Several attempts have been made by taxonomists and morphometricians to find out the best automated identification of biological species, but they haven’t found any effective species recognition system for several decades. It would be very helpful to carry out the behavioral and ecological studies on plants for plant classification with the help of species recognition system. Each species of a plant and its leaf has its own distinctive patterns, which enabled the researchers to perform some research on it to accurately classify the plants. In general, plant species recognition system includes image categorization and object recognition. Plant species recognition system is completely different from common image categorization, since the variation between the one plant species leaf and other is very small. As a result, the traditional image categorization techniques do not perform effectively on plant images. Automatic plant recognition system has not yet been well established largely being the fact that lack of research in this field and the complexity in obtaining the database. Species recognition system on plants is one of the major concerns at present and there is huge requirement for several researches to deal with the better plant species recognition system. In this survey, several plant species recognition system are discussed which will show the way for development of better plant species recognition system for plant classification.
ISOLATED TAMIL DIGIT SPEECH RECOGNITION USING TEMPLATE-BASED AND HMM-BASED APPROACHES
(Springer, 2012-07) Karpagavalli S; Deepika R; Kokila P; Usha Rani K; Chandra E
For more than three decades, a great amount of research was carried out on various aspects of speech signal processing and its applications. Highly successful application of speech processing is Automatic Speech Recognition (ASR). Early attempts to ASR consisted of making deterministic models of whole words in a small vocabulary and recognizing a given speech utterance as the word whose model comes closest to it. The introduction of Hidden Markov Models (HMMs) in the early 1980 provided much more powerful tool for speech recognition. And the recognition can be done for continuous speech using large vocabulary, in a speaker independent manner. Two approaches like conventional template-based and Hidden Markov Model usually performs speaker independent isolated word recognition. In this work, speaker independent isolated Tamil digit speech recognizers are designed by employing template based and HMM based approaches. The results of the approaches are compared and observed that HMM based model performs well and the word error rate is greatly reduced.
SCALY NEURAL NETWORKS FOR SPEECH RECOGNITION USING DTW AND TIME ALIGNMENT ALGORITHMS
(International Journal of Scientific and Research Publications,, 2012-10) Sabitha P V; Karpagavalli S
Speech recognition has been an active research topic for more than 50 years. Interacting with the computer through speech is one of the active scientific research fields particularly for the disable community who face variety of difficulties to use the computer. Such research in Automatic Speech Recognition (ASR) is investigated for different languages because each language has its specific features. Neural Networks are, in essence, biologically inspired networks since they are based on the current understanding of the biological nervous system. In essence they are comprised of a network of densely interconnected simple processing elements, which perform in a manner analogous to the most development of neural networks, and a basic introduction to their theory is outlined in this elementary functions of a biological neuron. Reduced connectivity neural networks are discussed and the scaly architecture neural network is described. Various algorithms are available to perform this time alignment of the input pattern to the neural network and the performance of the neural network is dependent upon the performance of the time alignment algorithm used. In this chapter, the various types of time alignment algorithms are described and their operation is outlined in detail.
ISOLATED TAMIL WORDS SPEECH RECOGNITION USING LINEAR PREDICTIVE CODING AND NEURAL NETWORKS
(2012-12) Sabitha P V; Karpagavalli S
Speech Recognition is the ability of a computer to recognize general, naturally flowing utterances from a wide variety of users. In recent years, with the new generation of computing technology, speech technology becomes the next major innovation in man-machine interaction. Automatic Speech Recognition (ASR) system takes a human speech utterance as an input and returns a string of words as output. Research on speech recognition has led to variety of applications like hands free and eyes free applications, voice user interfaces, simple data entry, forensic applications, voice authentication, biometrics, robotics, air traffic controllers, preparation of medical reports, learning tools for handicapped, and reading tools for blind people. Even though research in speech recognition in English language attained certain maturity, speech interfaces in Indian languages still in the startup level. Tamil is one of the widely spoken Indian languages of the world with more than 77 million speakers. Speech interfaces in Indian languages will enable the people in various semiurban and rural parts of India to use telephones and Internet services. In the proposed work, isolated Tamil words speech recognition interface is designed using neural network algorithm. To design the system, a dataset of 10 Tamil words uttered by 20 speakers each word 5 times has been prepared. Linear predictive coding of order 8 is used for feature extraction. Back-propagation training is carried with the feature vectors extracted using LPC from the speech files in the dataset. Multilayer Perceptron algorithm in neural network is employed for recognition of the words using the trained model. An interface also designed to recognize the Tamil words uttered by the user. The average recognition rate of the system is 93.6% and for few words it gives 100% accuracy. The performance of the system is measured using word recognition rate and word error rate