Study of Genes Associated With Parkinson Disease Using Feature Selection

Document Type : Original Article


1 Department of Computer Science, Memorial University of Newfoundland, NF, Canada

2 Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, Boca Raton, FL, USA

3 Department of Computer Science, Aarhus University, Aarhus, Denmark

4 Department of Computer and Electrical Engineering and Computer Science, Florida Atlantic University, FL, USA



The second most prevalent age-related neurodegenerative disease is Parkinson's (PD) and Genes associated with human diseases like Parkinson are descriptive. Genome-wide association study (GWAS) is used to classify the genes associated with Parkinson’s and other diseases. The information of identified genes empowers scientists to early diagnose, treat, and stop diseases. Due to the complexities of the illness, identifying such genes is a challenging task. In this article, we apply two methods of feature selection to choose a subset of genes that are used to predict PD with high precision in classification. The chromosome corresponding to selected features is analyzed by Perturbation-based Feature Selection (PFS) and Hilbert-Schmidt independence criterion (HSIC)-Lasso. These algorithms are used to identify how chromosomes play an important role with respect to PD. We used a dataset consist of 50 predominantly patients gene expression profiles with early-stage Parkinson's disease (PD) and 55 normal GEO samples. These methods provide a series of features involved in disease-specific processes that are applied to prioritize candidate genes in GWAS loci.


Driver, J. A., Logroscino, G., Gaziano, J. M., & Kurth, T., "Incidence and remaining lifetime risk of Parkinson disease in advanced age," Neurology, vol. 72, no. 5, pp. 432-438, 2009.
Ley, T. J., Mardis, E. R., Ding, L., Fulton, B., McLellan, M. D., Chen, K., ... & Cook, L. , "DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome," Nature, vol. 456, no. 7218, pp. 66-72, 2008.
Olson, R. S., La Cava, W., Mustahsan, Z., Varik, A., & Moore, J. H., "Data-driven advice for applying machine learning to bioinformatics problems," arXiv preprint arXiv, 2017.
Rafieipour, H., Zadeh, A. A., & Mirzaei, M., "Distributed Frequent Itemset Mining with Bitwise Method and Using the Gossip-Based Protocol," Journal of Soft Computing and Decision Support Systems, vol. 7, no. 3, pp. 32-39, 2020.
Nalls, M. A., Pankratz, N., Lill, C. M., Do, C. B., Hernandez, D. G., Saad, M., ... & Schulte, C., "Large-scale meta-analysis of genome-wide association data identifies six new risk loci for Parkinson's disease," Nature genetics, vol. 46, no. 9, pp. 989-993, 2014.
Pierce, S., & Coetzee, G. A., "Parkinson's disease-associated genetic variation is linked to quantitative expression of inflammatory genes," PLoS One, vol. 12, no. 4, 2017.
V. Izadi, H. Ahani and P. K. Shahri, A compressed-sensing-based compressor for ECG, Biomedical engineering letters, 2020.
S. R. Surakanti, S. A. Khoshnevis, H. Ahani and V. Izadi, "Efficient Recovery of Structrual Health Monitoring Signal based on Kronecker Compressive Sensing," International Journal of Applied Engineering Research, vol. 14, pp. 4256--4261, 2019.
H. Ahani, M. Familian and R. Ashtari, "Optimum Design of a Dynamic Positioning Controller for an Offshore Vessel," Journal of Soft Computing and Decision Support Systems, vol. 7, pp. 13--18, 2020.
Chang, D., Nalls, M. A., Hallgrímsdóttir, I. B., Hunkapiller, J., Van Der Brug, M., Cai, F., ... & Hinds, "A meta-analysis of genome-wide association studies identifies 17 new Parkinson's disease risk loci," Nature genetics, vol. 49, no. 10, p. 1511, 2017.
Ferrari, R., Kia, D. A., Tomkins, J. E., Hardy, J., Wood, N. W., Lovering, R. C., ... & Manzoni, C. , "Stratification of candidate genes for Parkinson’s disease using weighted protein-protein interaction network analysis," BMC genomics, vol. 19, no. 1, pp. 1-8, 2018.
Castillo, D., Galvez, J. M., Herrera, L. J., Rojas, F., Valenzuela, O., Caba, O., ... & Rojas, I. , "Leukemia multiclass assessment and classification from Microarray and RNA-seq technologies integration at gene expression level," PloS one, vol. 14, no. 2, p. e0212127, 2019.
Bolón-Canedo, V., Sánchez-Marono, N., Alonso-Betanzos, A., Benítez, J. M., & Herrera, F., "A review of microarray datasets and applied feature selection methods," Information Sciences, vol. 282, pp. 111-135, 2014.
Ivatt, R. M., Sanchez-Martinez, A., Godena, V. K., Brown, S., Ziviani, E., & Whitworth, A. J., "Genome-wide RNAi screen identifies the Parkinson disease GWAS risk locus SREBF1 as a regulator of mitophagy," in Proceedings of the National Academy of Sciences, 2014.
Smith, S. L., Lones, M. A., Bedder, M., Alty, J. E., Cosgrove, J., Maguire, R. J., ... & Elliott, C. J., "Computational approaches for understanding the diagnosis and treatment of Parkinson's disease," in IET systems biology, 2015.
Devi, S. N., & Rajagopalan, S. P., "A study on feature selection techniques in bio-informatics," International Journal of Advanced Computer Science and Applications, 2011.
Hira, Z. M., & Gillies, D. F., " A review of feature selection and feature extraction methods applied on microarray data," Advances in bioinformatics, 2015.
Sutha, K., & Tamilselvi, J. J., "A review of feature selection algorithms for data mining techniques," International Journal on Computer Science and Engineering, vol. 7, no. 6, p. 63, 2015.
Mwangi, B., Tian, T. S., & Soares, J. C., "A review of feature reduction techniques in neuroimaging," Neuroinformatics, vol. 12, no. 2, pp. 229-244, 2014.
Hall, M. A., & Holmes, G. , "Benchmarking attribute selection techniques for discrete class data mining," IEEE Transactions on Knowledge and Data engineering, vol. 15, no. 6, pp. 1437-1447, 2003.
Kohavi, R., & John, G. H., "Wrappers for feature subset selection," Artificial intelligence, vol. 97, no. 1-2, pp. 273-324, 1997.
Varshavsky, R., Gottlieb, A., Horn, D., & Linial, M., "Unsupervised feature selection under perturbations: meeting the challenges of biological data," Bioinformatics, vol. 23, no. 24, pp. 3343-3349, 2007.
Boulesteix, A. L., & Slawski, M. , "Stability and aggregation of ranked gene lists," Briefings in bioinformatics, vol. 10, no. 5, pp. 556-568, 2009.
Tibshirani, R. , "Regression shrinkage and selection via the lasso," Journal of the Royal Statistical Society: Series B (Methodological), vol. 58, no. 1, pp. 267-288, 1996.
Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R., "Least angle regression," The Annals of statistics, vol. 32, no. 2, pp. 407-499, 2004.
Yamada, M., Jitkrittum, W., Sigal, L., Xing, E. P., & Sugiyama, M. , "High-dimensional feature selection by feature-wise kernelized lasso," Neural computation, vol. 26, no. 1, pp. 185-207, 2014.
Jung, Y., & Hu, J., "AK-fold averaging cross-validation procedure," Journal of nonparametric statistics, vol. 72, no. 2, pp. 167-179, 2015.
Kohavi, R., "A study of cross-validation and bootstrap for accuracy estimation and model selection," in International Joint Conference on AI(IJCAI-95), 1995.
Cortes, C., & Vapnik, V. , "Support-vector networks," Machine learning, vol. 20, no. 3, pp. 273-297, 1995.
Breiman, L. , "Random forests," Machine learning, vol. 45, no. 1, pp. 5-32, 2001.
Breiman, L., Friedman, J., Stone, C. J., & Olshen, R. A., Classification and regression trees, CRC press, 1984.
Breiman, L., "Bagging predictors," Machine learning, vol. 24, no. 2, pp. 123-140, 1996.
Freund, Y., & Schapire, R. E., "A decision-theoretic generalization of on-line learning and an application to boosting," Journal of computer and system sciences, vol. 55, no. 1, pp. 119-139, 1997.
Scherzer, C. R., Grass, J. A., Liao, Z., Pepivani, I., Zheng, B., Eklund, A. C., ... & Bresnick, E. H., "GATA transcription factors directly regulate the Parkinson's disease-linked gene α-synuclein," in Proceedings of the National Academy of Sciences, 2008.