IMPROVING CLUSTERING ALGORITHM FOR GENE EXPRESSION DATA USING HYBRID ALGORITHM
The technology of DNA Microarray has the ability to measure the levels of gene expression in different experimental conditions. Thousands of genes are generated in microarray experiments. The problem is that not all genes are significant; some of the genes may be noisy and irrelevant. The algorithms of Gene Selection are one of the important steps in the discovery of knowledge to select genes which are more informative. The other central goal of analyzing the data of gene expression is to identify genes that have similar patterns by using clustering processes. Clustering is a crucial process in the processes of data mining. It can divide genes into groups so that genes within the same group have similar features and share common biological functions.Â In this study, the method of mutual information for gene selection has been applied because it is able to detect nonlinear relationships between genes data. After that, the K-Means algorithm is applied to cluster data. The proposed approach results showed that it is capable of refining the data of gene expression for improved quality of clusters, handling noise effectively, and reducing the computational space.
D. M. Dziuda, Data mining for genomics and proteomics: analysis of gene and protein expression data, vol. 1. John Wiley & Sons, 2010.
F. Rafii, M. A. Kbir, and B. D. R. Hassani, â€œMicroarray Data Preprocessing To Improve Exploration on Biological Databases,â€ in International Conference on Big Data, Cloud and Applications, Tetuan, Morocco, 2015, pp. 25â€“26.
J. Han, J. Pei, and M. Kamber, Data mining: concepts and techniques. Elsevier, 2011.
D. Jiang, C. Tang, and A. Zhang, â€œCluster analysis for gene expression data: A survey,â€ IEEE Trans. Knowl. Data Eng., no. 11, pp. 1370â€“1386, 2004.
R. Fa, A. K. Nandi, and L.-Y. Gong, â€œClustering analysis for gene expression data: A methodological review,â€ in 2012 5th International Symposium on Communications, Control and Signal Processing, 2012, pp. 1â€“6.
C. Yang, B. Wan, and X. Gao, â€œEffectivity of internal validation techniques for gene clustering,â€ in International Symposium on Biological and Medical Data Analysis, 2006, pp. 49â€“59.
S. J. Susmi, H. K. Nehemiah, A. Kannan, and G. Saranya, â€œHybrid Algorithm for Clustering Gene Expression Data,â€ Res. J. Appl. Sci. Eng. Technol., vol. 11, no. 7, pp. 692â€“700, 2015.
T. Scaria, G. Stephen, and J. Mathew, â€œGene Expression Data Analysis using Fuzzy C-means Clustering Technique,â€ Int. J. Comput. Appl., vol. 135, no. 8, pp. 33â€“36, 2016.
A. Makolo and T. Adigun, â€œOptimization of clustering algorithms for gene expression data analysis using distance measures,â€ Int. J. Comput. Appl., vol. 975, p. 8887, 2016.
J. Parraga-Alava and M. Inostroza-Ponta, â€œA bi-objective clustering algorithm for gene expression data,â€ CLEI Electron. J., vol. 20, no. 2, pp. 1â€“17, 2017.
P. Heller and B. Baiju, â€œAn improved distance metric for clustering gene expression time-series data,â€ Am. J. Adv. Res., vol. 2, p. 1, 2018.
N. Yu, Y.-L. Gao, J.-X. Liu, J. Shang, R. Zhu, and L.-Y. Dai, â€œCo-differential gene selection and clustering based on graph regularized multi-view NMF in cancer genomic data,â€ Genes (Basel)., vol. 9, no. 12, p. 586, 2018.
M. M. Babu, â€œIntroduction to microarray data analysis,â€ Comput. genomics Theory Appl., vol. 17, no. 6, pp. 225â€“249, 2004.
R. D. Pearson, X. Liu, G. Sanguinetti, M. Milo, N. D. Lawrence, and M. Rattray, â€œpuma: a Bioconductor package for propagating uncertainty in microarray analysis,â€ BMC Bioinformatics, vol. 10, no. 1, p. 211, 2009.
T. Schlitt and P. Kemmeren, â€œFrom microarray data to results: Workshop on Genomic Approaches to Microarray Data Analysis,â€ EMBO Rep., vol. 5, no. 5, pp. 459â€“463, 2004.
Y. Li, W. Liu, Y. Jia, and H. Dong, â€œA weighted Mutual Information Biclustering algorithm for gene expression data.,â€ Comput. Sci. Inf. Syst., vol. 14, no. 3, pp. 643â€“660, 2017.
A. Brazma et al., â€œMinimum information about a microarray experiment (MIAME)â€”toward standards for microarray data,â€ Nat. Genet., vol. 29, no. 4, p. 365, 2001.
H. Abusamra, â€œA comparative study of feature selection and classification methods for gene expression data.â€ 2013.
X. Liu, A. Krishnan, and A. Mondry, â€œAn entropy-based gene selection method for cancer classification using microarray data,â€ BMC Bioinformatics, vol. 6, no. 1, p. 76, 2005.
K. Das, J. Ray, and D. Mishra, â€œGene selection using information theory and statistical approach,â€ Indian J. Sci. Technol., vol. 8, no. 8, p. 695, 2015.
P. R. Al-Rashid, S., Arifur, M., Al-aaraji, N. H., Lawrence, N. D., & Heath, â€œIncreasing Power by Sharing Information from Genetic Background and Treatment in Clustering of Gene Expression Time Series,â€ J. Univ. Babylon, Pure Appl. Sci., 2018.
C. Zhang and S. Xia, â€œK-means clustering algorithm with improved initial center,â€ in 2009 Second International Workshop on Knowledge Discovery and Data Mining, 2009, pp. 790â€“792.
D. Q. Zeebaree, H. Haron, A. M. Abdulazeez, and S. R. M. Zeebaree, â€œCombination of K-means clustering with Genetic Algorithm: A review,â€ Int. J. Appl. Eng. Res., vol. 12, no. 24, pp. 14238â€“14245, 2017.
M. Ester, H.-P. Kriegel, J. Sander, and X. Xu, â€œA density-based algorithm for discovering clusters in large spatial databases with noise.,â€ in Kdd, 1996, vol. 96, no. 34, pp. 226â€“231.
D. R. Edla, P. K. Jana, and I. S. Member, â€œA prototype-based modified DBSCAN for gene clustering,â€ Procedia Technol., vol. 6, pp. 485â€“492, 2012.
K. G. Derpanis, â€œMean shift clustering,â€ Lect. Notes, p. 32, 2005.
P. Bholowalia and A. Kumar, â€œEBK-means: A clustering technique based on elbow method and k-means in WSN,â€ Int. J. Comput. Appl., vol. 105, no. 9, 2014.
T. Thinsungnoena, N. Kaoungkub, P. Durongdumronchaib, K. Kerdprasopb, and N. Kerdprasopb, â€œThe clustering validity with silhouette and sum of squared errors,â€ learning, vol. 3, p. 7, 2015.
R. LletÄ±, M. C. Ortiz, L. A. Sarabia, and M. S. SÃ¡nchez, â€œSelecting variables for k-means cluster analysis by using a genetic algorithm that optimises the silhouettes,â€ Anal. Chim. Acta, vol. 515, no. 1, pp. 87â€“100, 2004.
Spellman PT, Sherlock G, Zhang MQ, et al. Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998;9(12):3273â€“3297. doi:10.1091/mbc.9.12.3273
The submitter hereby warrants that the Work (collectively, the “Materials”) is original and that he/she is the author of the Materials. To the extent the Materials incorporate text passages, figures, data or other material from the works of others, the undersigned has obtained any necessary permissions. Where necessary, the undersigned has obtained all third party permissions and consents to grant the license above and has all copies of such permissions and consents.
The submitter represents that he/she has the power and authority to make and execute this assignment. The submitter agrees to indemnify and hold harmless the COMPUSOFT from any damage or expense that may arise in the event of a breach of any of the warranties set forth above. For authenticity, validity and originality of the research paper the author/authors will be totally responsible.