EFFICIENTLY MINING CLOSED SEQUENCE PATTERNS IN DNA WITHOUT CANDIDATE GENERTION

Jawahar S; Harishchander A; Devaraju S; Reshmi S; Manivasagan C; Sumathi P

EFFICIENTLY MINING CLOSED SEQUENCE PATTERNS IN DNA WITHOUT CANDIDATE GENERTION

dc.contributor.author	Jawahar S
dc.contributor.author	Harishchander A
dc.contributor.author	Devaraju S
dc.contributor.author	Reshmi S
dc.contributor.author	Manivasagan C
dc.contributor.author	Sumathi P
dc.date.accessioned	2023-09-06T09:19:23Z
dc.date.available	2023-09-06T09:19:23Z
dc.date.issued	2020
dc.description.abstract	Sequential pattern mining is a technique which efficiently determines the frequent patterns from small datasets. The traditional sequential pattern mining algorithms can mine short-term sequences efficiently, but mining long sequence patterns are in efficient for these algorithms. The traditional mining algorithms use candidate generation method which leads to more search space and greater running time. The biological DNA sequences have long sequences with small alphabets. These biological data can be mined for finding the co-occurring biological sequence. These co-occurring sequences are important for biological data analysis and data mining. Closed sequential pattern mining is used for mining long sequences. The mined patterns have less number of closed sequences. This paper proposes an efficient Closed Sequential Pattern Mining without Candidate Generation (CSPMCG) algorithm for efficiently mining closed sequential patterns. The CSPMCG algorithm mines closed patterns without candidate generation. This algorithm uses two pruning methods namely, BackScan pruning, and frequent occurrence check methods. The former method prunes the search space and latter detects the closed sequential pattern in early run time. The proposed algorithm is compared with PrefixSpan and SPADE algorithms, better scalability and interpretability is achieved for proposed algorithm. The experimental results are based on sample DNA datasets which outperform the other algorithms in efficiency, memory and running time.	en_US
dc.identifier.issn	2250 – 0480
dc.identifier.uri	https://dspace.psgrkcw.com/handle/123456789/3498
dc.language.iso	en_US	en_US
dc.publisher	International Journal Life Science Pharmacology Research	en_US
dc.subject	Text Mining	en_US
dc.subject	Clustering	en_US
dc.subject	Semi supervised Learning	en_US
dc.subject	Constrained Clustering	en_US
dc.subject	Co-Clustering	en_US
dc.title	EFFICIENTLY MINING CLOSED SEQUENCE PATTERNS IN DNA WITHOUT CANDIDATE GENERTION	en_US
dc.type	Article	en_US

Files

Original bundle

Now showing 1 - 1 of 1

Name:: EFFICIENTLY MINING CLOSED SEQUENCE PATTERNS IN DNA WITHOUT CANDIDATE GENERTION.docx
Size:: 263.76 KB
Format:: Microsoft Word XML
Description:

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

International Journals