Machine learning techniques to discover genes with potential prognosis role in Alzheimer's disease using different biological sources
Martinez-Ballesteros, Maria; Garcia-Heredia, Jose M.; Nepomuceno-Chamorro, Isabel A.; Riquelme-Santos, Jose C.
Publicación: INFORMATION FUSION
2017
VL / 36 - BP / 114 - EP / 129
abstract
Alzheimer's disease is a complex progressive neurodegenerative brain disorder, being its prevalence expected to rise over the next decades. Unconventional strategies for elucidating the genetic mechanisms are necessary due to its polygenic nature. In this work, the input information sources are five: a public DNA microarray that measures expression levels of control and patient samples, repositories of known genes associated to Alzheimer's disease, additional data, Gene Ontology and finally, a literature review or expert knowledge to validate the results. As methodology to identify genes highly related to this disease, we present the integration of three machine learning techniques: particularly, we have used decision trees, quantitative association rules and hierarchical cluster to analyze Alzheimer's disease gene expression profiles to identify genes highly linked to this neurodegenerative disease, through changes in their expression levels between control and patient samples. We propose an ensemble of decision trees and quantitative association rules to find the most suitable configurations of the multi-objective evolutionary algorithm GarNet, in order to overcome the complex parametrization intrinsic to this type of algorithms. To fulfill this goat, GarNet has been executed using multiple configuration settings and the well-known C4.5 has been used to find the minimum accuracy to be satisfied. Then, GarNet is rerun to identify dependencies between genes and their expression levels, so we are able to distinguish between healthy individuals and Alzheimer's patients using the configurations that overcome the minimum threshold of accuracy defined by C4.5 algorithm. Finally, a hierarchical cluster analysis has been used to validate the obtained gene-Alzheimer's Disease associations provided by GarNet The results have shown that the obtained rules were able to successfully characterize the underlying information, grouping relevant genes for Alzheimer Disease. The genes reported by our approach provided two well defined groups that perfectly divided the samples between healthy and Alzheimer's Disease patients. To prove the relevance of the obtained results, a statistical test and gene expression fold-change were used. Furthermore, this relevance has been summarized in a volcano plot, showing two clearly separated and significant groups of genes that are up or down-regulated in Alzheimer's Disease patients. A biological knowledge integration phase was performed based on the information fusion of systematic literature review, enrichment Gene Ontology terms for the described genes found in the hippocampus of patients. Finally, a validation phase with additional data and a permutation test is carried out, being the results consistent with previous studies. (C) 2016 Elsevier B.V. All rights reserved.
MENTIONS DATA
Computer Science
-
0 Twitter
-
2 Wikipedia
-
0 News
-
2 Policy
Publicaciones similares en Computer Science