EFFICIENTLY MINING CLOSED SEQUENCE PATTERNS IN DNA WITHOUT CANDIDATE GENERTION
No Thumbnail Available
Date
2020
Journal Title
Journal ISSN
Volume Title
Publisher
International Journal Life Science Pharmacology Research
Abstract
Sequential pattern mining is a technique which efficiently determines the frequent patterns from small
datasets. The traditional sequential pattern mining algorithms can mine short-term sequences efficiently, but
mining long sequence patterns are in efficient for these algorithms. The traditional mining algorithms use
candidate generation method which leads to more search space and greater running time. The biological
DNA sequences have long sequences with small alphabets. These biological data can be mined for finding
the co-occurring biological sequence. These co-occurring sequences are important for biological data
analysis and data mining. Closed sequential pattern mining is used for mining long sequences. The mined
patterns have less number of closed sequences. This paper proposes an efficient Closed Sequential Pattern
Mining without Candidate Generation (CSPMCG) algorithm for efficiently mining closed sequential
patterns. The CSPMCG algorithm mines closed patterns without candidate generation. This algorithm uses
two pruning methods namely, BackScan pruning, and frequent occurrence check methods. The former
method prunes the search space and latter detects the closed sequential pattern in early run time. The
proposed algorithm is compared with PrefixSpan and SPADE algorithms, better scalability and
interpretability is achieved for proposed algorithm. The experimental results are based on sample DNA
datasets which outperform the other algorithms in efficiency, memory and running time.
Description
Keywords
Text Mining, Clustering, Semi supervised Learning, Constrained Clustering, Co-Clustering