Keywords: Microarray Technology, Gene Expression Data, Genes Selection, Clustering Algorithms, Clustering validation


The technology of DNA Microarray has the ability to measure the levels of gene expression in different experimental conditions. Thousands of genes are generated in microarray experiments. The problem is that not all genes are significant; some of the genes may be noisy and irrelevant. The algorithms of Gene Selection are one of the important steps in the discovery of knowledge to select genes which are more informative. The other central goal of analyzing the data of gene expression is to identify genes that have similar patterns by using clustering processes. Clustering is a crucial process in the processes of data mining. It can divide genes into groups so that genes within the same group have similar features and share common biological functions.  In this study, the method of mutual information for gene selection has been applied because it is able to detect nonlinear relationships between genes data. After that, the K-Means algorithm is applied to cluster data. The proposed approach results showed that it is capable of refining the data of gene expression for improved quality of clusters, handling noise effectively, and reducing the computational space.


Download data is not yet available.

Author Biographies

Ameer Ali AL-Mshanji, University of Babylon
Software department
Sura Zaki Al-Rashid, University of Babylon
Software department


D. M. Dziuda, Data mining for genomics and proteomics: analysis of gene and protein expression data, vol. 1. John Wiley & Sons, 2010.

F. Rafii, M. A. Kbir, and B. D. R. Hassani, “Microarray Data Preprocessing To Improve Exploration on Biological Databases,†in International Conference on Big Data, Cloud and Applications, Tetuan, Morocco, 2015, pp. 25–26.

J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques. Elsevier, 2011.

D. Jiang, C. Tang, and A. Zhang, “Cluster analysis for gene expression data: A survey,†IEEE Trans. Knowl. Data Eng., no. 11, pp. 1370–1386, 2004.

R. Fa, A. K. Nandi, and L.-Y. Gong, “Clustering analysis for gene expression data: A methodological review,†in 2012 5th International Symposium on Communications, Control and Signal Processing, 2012, pp. 1–6.

C. Yang, B. Wan, and X. Gao, “Effectivity of internal validation techniques for gene clustering,†in International Symposium on Biological and Medical Data Analysis, 2006, pp. 49–59.

S. J. Susmi, H. K. Nehemiah, A. Kannan, and G. Saranya, “Hybrid Algorithm for Clustering Gene Expression Data,†Res. J. Appl. Sci. Eng. Technol., vol. 11, no. 7, pp. 692–700, 2015.

T. Scaria, G. Stephen, and J. Mathew, “Gene Expression Data Analysis using Fuzzy C-means Clustering Technique,†Int. J. Comput. Appl., vol. 135, no. 8, pp. 33–36, 2016.

A. Makolo and T. Adigun, “Optimization of clustering algorithms for gene expression data analysis using distance measures,†Int. J. Comput. Appl., vol. 975, p. 8887, 2016.

J. Parraga-Alava and M. Inostroza-Ponta, “A bi-objective clustering algorithm for gene expression data,†CLEI Electron. J., vol. 20, no. 2, pp. 1–17, 2017.

P. Heller and B. Baiju, “An improved distance metric for clustering gene expression time-series data,†Am. J. Adv. Res., vol. 2, p. 1, 2018.

N. Yu, Y.-L. Gao, J.-X. Liu, J. Shang, R. Zhu, and L.-Y. Dai, “Co-differential gene selection and clustering based on graph regularized multi-view NMF in cancer genomic data,†Genes (Basel)., vol. 9, no. 12, p. 586, 2018.

M. M. Babu, “Introduction to microarray data analysis,†Comput. genomics Theory Appl., vol. 17, no. 6, pp. 225–249, 2004.

R. D. Pearson, X. Liu, G. Sanguinetti, M. Milo, N. D. Lawrence, and M. Rattray, “puma: a Bioconductor package for propagating uncertainty in microarray analysis,†BMC Bioinformatics, vol. 10, no. 1, p. 211, 2009.

T. Schlitt and P. Kemmeren, “From microarray data to results: Workshop on Genomic Approaches to Microarray Data Analysis,†EMBO Rep., vol. 5, no. 5, pp. 459–463, 2004.

Y. Li, W. Liu, Y. Jia, and H. Dong, “A weighted Mutual Information Biclustering algorithm for gene expression data.,†Comput. Sci. Inf. Syst., vol. 14, no. 3, pp. 643–660, 2017.

A. Brazma et al., “Minimum information about a microarray experiment (MIAME)—toward standards for microarray data,†Nat. Genet., vol. 29, no. 4, p. 365, 2001.

H. Abusamra, “A comparative study of feature selection and classification methods for gene expression data.†2013.

X. Liu, A. Krishnan, and A. Mondry, “An entropy-based gene selection method for cancer classification using microarray data,†BMC Bioinformatics, vol. 6, no. 1, p. 76, 2005.

K. Das, J. Ray, and D. Mishra, “Gene selection using information theory and statistical approach,†Indian J. Sci. Technol., vol. 8, no. 8, p. 695, 2015.

P. R. Al-Rashid, S., Arifur, M., Al-aaraji, N. H., Lawrence, N. D., & Heath, “Increasing Power by Sharing Information from Genetic Background and Treatment in Clustering of Gene Expression Time Series,†J. Univ. Babylon, Pure Appl. Sci., 2018.

C. Zhang and S. Xia, “K-means clustering algorithm with improved initial center,†in 2009 Second International Workshop on Knowledge Discovery and Data Mining, 2009, pp. 790–792.

D. Q. Zeebaree, H. Haron, A. M. Abdulazeez, and S. R. M. Zeebaree, “Combination of K-means clustering with Genetic Algorithm: A review,†Int. J. Appl. Eng. Res., vol. 12, no. 24, pp. 14238–14245, 2017.

M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, “A density-based algorithm for discovering clusters in large spatial databases with noise.,†in Kdd, 1996, vol. 96, no. 34, pp. 226–231.

D. R. Edla, P. K. Jana, and I. S. Member, “A prototype-based modified DBSCAN for gene clustering,†Procedia Technol., vol. 6, pp. 485–492, 2012.

K. G. Derpanis, “Mean shift clustering,†Lect. Notes, p. 32, 2005.

P. Bholowalia and A. Kumar, “EBK-means: A clustering technique based on elbow method and k-means in WSN,†Int. J. Comput. Appl., vol. 105, no. 9, 2014.

T. Thinsungnoena, N. Kaoungkub, P. Durongdumronchaib, K. Kerdprasopb, and N. Kerdprasopb, “The clustering validity with silhouette and sum of squared errors,†learning, vol. 3, p. 7, 2015.

R. Lletı, M. C. Ortiz, L. A. Sarabia, and M. S. Sánchez, “Selecting variables for k-means cluster analysis by using a genetic algorithm that optimises the silhouettes,†Anal. Chim. Acta, vol. 515, no. 1, pp. 87–100, 2004.

Spellman PT, Sherlock G, Zhang MQ, et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998;9(12):3273–3297. doi:10.1091/mbc.9.12.3273

How to Cite
AL-Mshanji, A. A., & Al-Rashid, S. Z. (2019). IMPROVING CLUSTERING ALGORITHM FOR GENE EXPRESSION DATA USING HYBRID ALGORITHM. COMPUSOFT: An International Journal of Advanced Computer Technology, 8(9). Retrieved from