AN OPTIMIZED WEIGHTED CONSENSUS CLUSTERING WITH REMOVAL OF LESS INFORMATIVE COMPOSITE CLUSTERS
No Thumbnail Available
Date
2022-03
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
IEEE
Abstract
The most demanding processes in clinical diagnostics are the proper classification of cancer from a large amount of Gene Expression Data (GED). This article proposes a Weighted Consensus of Lion Optimized K-means Ensemble with Peak Density Clustering (WECLO K-means-PDC) algorithm, which disregards the less informative composite clusters to increase the accuracy of classifying the GED. This algorithm refines the clusters at all iterations by considering the Symmetric Neighbourhood (SN) correlation among data elements. For this clustering, the lion optimization algorithm is applied instead of random subspace and sampling which assumes the cluster validation metrics as fitness functions for effective clustering. Also, the SN Graph (SNG) is constructed over each data element using the adaptive PDC combined with K-means clustering. This SNG helps to choose the number of cluster centroids and refines the clusters at each iteration of K-means clustering without computing the cut off distance between two data elements. By using the SNG, the outliers are signified as the data elements having fewer than two neighbours. Moreover, all data elements are allocated to a suitable cluster by the breadth-first search on SNG and the less informative composite clusters are removed. Finally, the experimental outcomes show that the WECLO K-means-PDC on Leukemia, Lymphoma, Prostate cancer, SRBCT and breast cancer databases achieve 85%, 85.4%, 84.8%, 84.3% and 85% of accuracy, respectively compared to the classical algorithms.
Description
Keywords
gene expression data, semi-supervised clustering, WECR K-means, cluster ensemble, peak density clustering, symmetric neighbourhood