AN OPTIMIZED WEIGHTED CONSENSUS CLUSTERING WITH REMOVAL OF LESS INFORMATIVE COMPOSITE CLUSTERS

No Thumbnail Available

Date

2022-03

Journal Title

Journal ISSN

Volume Title

Publisher

IEEE

Abstract

The most demanding processes in clinical diagnostics are the proper classification of cancer from a large amount of Gene Expression Data (GED). This article proposes a Weighted Consensus of Lion Optimized K-means Ensemble with Peak Density Clustering (WECLO K-means-PDC) algorithm, which disregards the less informative composite clusters to increase the accuracy of classifying the GED. This algorithm refines the clusters at all iterations by considering the Symmetric Neighbourhood (SN) correlation among data elements. For this clustering, the lion optimization algorithm is applied instead of random subspace and sampling which assumes the cluster validation metrics as fitness functions for effective clustering. Also, the SN Graph (SNG) is constructed over each data element using the adaptive PDC combined with K-means clustering. This SNG helps to choose the number of cluster centroids and refines the clusters at each iteration of K-means clustering without computing the cut off distance between two data elements. By using the SNG, the outliers are signified as the data elements having fewer than two neighbours. Moreover, all data elements are allocated to a suitable cluster by the breadth-first search on SNG and the less informative composite clusters are removed. Finally, the experimental outcomes show that the WECLO K-means-PDC on Leukemia, Lymphoma, Prostate cancer, SRBCT and breast cancer databases achieve 85%, 85.4%, 84.8%, 84.3% and 85% of accuracy, respectively compared to the classical algorithms.

Description

Keywords

gene expression data, semi-supervised clustering, WECR K-means, cluster ensemble, peak density clustering, symmetric neighbourhood

Citation

Endorsement

Review

Supplemented By

Referenced By