Prediction and importance of predictors in approaches based on computational intelligence and machine learning

  • Antônio Carlos Silva Júnior Federal University of Viçosa
  • Waldênia Melo Moura Empresa de Pesquisa Agropecuária de Minas Gerais
  • Leonardo Lopes Bhering Federal University of Viçosa
  • Michele Jorge Silva Siqueira Federal University of Viçosa
  • Weverton Gomes Costa Federal University of Viçosa
  • Moysés Nascimento Federal University of Viçosa
  • Cosme Damião Cruz Federal University of Viçosa
Keywords: Plant breeding, big data, artificial neural networks, decision tree, bagging, random forest, boosting

Abstract

Machine learning and computational intelligence are rapidly emerging in plant breeding, allowing the exploration of big data concepts and predicting the importance of predictors. In this context, the main challenges are how to analyze datasets and extract new knowledge at all levels of research.  Predicting the importance of variables in genetic improvement programs allows for faster progress, carrying out an extensive phenotypic evaluation of the germplasm, and selecting and predicting traits that present low heritability and/or measurement difficulties. Although, simultaneous evaluation of traits provides a wide variety of information, identifying which predictor variable is most important is a challenge for the breeder. The traditional approach to variable selection is based on multiple linear regression. It evaluates the relationship between a response variable and two or more independent variables.  However, this approach has limitations regarding its ability to analyze high-dimensional data and not capture complex and multivariate relationships between traits. In summary, machine learning and computational intelligence approaches allow inferences about complex interactions in plant breeding. Given this, a systematic review to disentangle machine learning and computational intelligence approaches is relevant to breeders and was considered in this review. We present the main steps for developing each strategy (from data selection to evaluating classification/prediction models and quantifying the best predictor).

 

Downloads

Download data is not yet available.

Author Biographies

Antônio Carlos Silva Júnior, Federal University of Viçosa

Department of General Biology

Leonardo Lopes Bhering, Federal University of Viçosa

Department of General Biology

Michele Jorge Silva Siqueira, Federal University of Viçosa

Department of General Biology

Weverton Gomes Costa, Federal University of Viçosa

Department of General Biology

Moysés Nascimento, Federal University of Viçosa

Department of Statistic

Cosme Damião Cruz, Federal University of Viçosa

Department of General Biology

References

Beck, M. (2018). Neural Net Tools: Visualization and Analysis Tools for Neural Networks. R package version 1.5.2. http://dx.doi.org/10.18637/jss.v085.i11

Beucher, A., Møller, A. B., & Greve, M. H. (2019). Artificial neural networks and decision tree classification for predicting soil drainage classes in Denmark, Geoderma, 352, 351-359. http://dx.doi.org/10.1016/j.geoderma.2017.11.004

Carneiro, A. R. T., Sanglard, D. A., Azevedo, A. M., Souza, T. L. P. O., Pereira, H. S., & Melo, L. C. (2019). Fuzzy logic in automation for interpretation of adaptability and stability in plant breeding studies. Scientia Agricola, 76, 123-129. https://doi.org10.1590/1678-992x-2017-0207

Carneiro, V. Q., Prado, A. L., Cruz, C. D., Carneiro, P. C. S., Nascimento, M., & Carneiro, J. E.S. (2018). Fuzzy control systems for decision-making in cultivars recommendation. Acta Scientiarum. Agronomy, 40, 1-8. http://dx.doi.org/10.4025/actasciagron.v40i1.39314

Cruz, C. D., & Nascimento, M. (2018). Inteligência Computacional aplicada ao melhoramento genético. 1st ed. Vicosa: Editora UFV.

Friedman, J. H. (1991). Multivariate Adaptative regression Splines. The Annals of Statistics, 19, 1–141.

Garson, G. D. (1991). Interpreting neural network connection weights. Artificial Intelligence Expert, 6, 46-51.

Goh, A. T. C. (1995). Back-propagation neural networks for modeling complex systems. Artificial Intelligence in Engineering, 9, 143-151. http://dx.doi.org/10.1016/0954-1810(94)00011-S

Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning: Data mining, inference, and prediction. 2nd ed. New York, NY, USA: Springer.

Haykin, S. (2001). Redes neurais – princípios e prática. 2nd ed. Porto Alegre, RS: Bookman.

Leathwick, J.R., Elith, J., & Hastie, T. (2006). Comparative performance of generalized additive models and multivariate adaptive regression splines for statistical modelling of species distributions. Ecological Modelling. 199, 188–196.

Li, L., & Zha, Y. (2019). Estimating monthly average temperature by remote sensing in China. Advances in Space Research 63(8), 2345-2357. https://doi.org/10.1016/j.asr.2018.12.039

Lin, H.Y., Wang, W., Liu, Y.H., Soong, S.J., York, T.P., Myers, L, & Hu, J.J. (2008). Comparison of multivariate adaptive regression splines and logistic regression in detecting SNP-SNP interactions and their application in prostate cancer. Journal of Human Genetics, 53, 802–811.

Lorena, A. C., & Carvalho, A. C. P. L. F. (2003). Introdução às Máquinas de Vetores Suporte. São Carlos, SP: ICMC - USP.

Motsinger, A. A., Ritchie, M. D., & Reif, D. M. (2007). Novel methods for detecting epistasis in pharmacogenomics studies. Pharmacogenomics, 8, 1229–1241.

Olden, J. D., & Jackson, D. A. (2002). “Illuminating the “Black Box”: A Randomization Approach for Understanding Variable Contributions in Artifical Neural Networks.” Ecological Modelling, 154, 135–150. http://dx.doi.org/10.1016/s0304-3800(02)00064-9

Paliwal, M., & Kumar, U. A. (2011). Assessing the contribution of variables in feed forward neural network. Applied Soft Computing, 11, 3690-3696

Park, J., & Sandberg, I. W. (1991). Universal approximation using radial basis function networks, Neural Comput., 3, 246–259. DOI: 10.1162/neco.1991.3.2.246

Parmley, K. A., Higgins, R. H., &Ganapathysubramanian, B., Sarkar, S., & Singh, A. K. (2019). Machine Learning Approach for Prescriptive Plant Breeding. Scientific Report, 9, Article number: 17132. http://dx.doi.org/10.1038/s41598-019-53451-4

Paruelo, J. M., & Tomasel, F. (1997). “Prediction of Functional Characteristics of Ecosystems: A Comparison of Artificial Neural Networks and Regression Models.” Ecological Modelling, 98, 173–186. http://dx.doi.org/10.1016/s0304-3800(96)01913-8

Paswan, R. P., & Begum, S. A. (2013). Regression and Neural Networks Models for Prediction of Crop Production. International Journal of Scientific & Engineering Research, 4, 98-108.

Sant’Anna, I. C., Ferreira, R. A. D. C., Nascimento, M., Carneiro, V. Q., Silva, G. N., Cruz, C. D., Oliveira, M. S., & Chagas, F. E. O. (2019). Multigenerational prediction of genetic values using genome-enabled prediction. PLoS One, 14, e0210531. http://dx.doi.org/10.1371/journal.pone.0210531

Sant’Anna, I. C., Tomaz, R. S., Silva, G. N., Nascimento, M., Bhering, L. L., & Cruz, C. D. (2015). Superiority of artificial neural networks for a genetic classification procedure. Genetic and Molecular Research, 14, 9898–9906.

Silva, G. N., Tomaz, R. S., Sant’Anna, I. C., Carneiro, V. Q., Cruz, C. D., & Nascimento, M. (2016). Evaluation of the efficiency of artificial neural networks for genetic value prediction. Genetic and Molecular Research, 15, 1–11. http://hdl.handle.net/11449/158805

Silva, G. N., Tomaz, R. S., Sant’anna, I. C., Nascimento, M., Bhering, L. L., & Cruz, C.D. (2014). Neural networks for predicting breeding values and genetic gains. Scientia Agricola, 71, 494-498. http://dx.doi.org/10.1590/0103-9016-2014-0057

Silva, J. C. F., Teixeira, R. M., Silva, F. F., Brommonschenkel, S. H., & Fontes, E. P. B. (2019). Machine learning approaches and their current application in plant molecular biology: A systematic review. Plant Science, 284, 37-47. http://dx.doi.org/10.1016/j.plantsci.2019.03.020

Silva Júnior, A. C., Sant’Anna, I. C., Silva, G. N., Cruz, C. D., Nascimento, M., Lopes, L.B., & Soares, P. C. (2023). Computational intelligence and machine learning to study the importance of characteristics in flood-irrigated rice. Acta Scientiarum-Agronomy, 45, e57209. http://dx.doi.org/10.4025/actasciagron.v45i1.57209

Silva Júnior, A. C., Silva, M. J., Cruz, C. D., Santanna, I. C., Silva, G. N., Nascimento, M., & Azevedo, C.F. (2021). Prediction of the importance of auxiliary traits using computational intelligence and machine learning: A simulation study. PLoS One, 16, e0257213. https://doi.org/10.1371/journal.pone.0257213

Silva Júnior, A. C., Silva, M. J., Sousa, I., Costa, W. G., Cruz, C. D., Nascimento, M., & Soares, P. C. (2021). Fuzzy logic for adaptability and stability studies in irrigated rice (Oryza Sativa L.) genotypes. Plant Breeding, v. 140, p. 719-980. https://doi.org/10.1111/pbr.12973

Skawsang, S., Nagai, M., Nitin, K., & Soni, P. (2019). Predicting Rice Pest Population Occurrence with Satellite-Derived Crop Phenology, Ground Meteorological Observation, and Machine Learning: A Case Study for the Central Plain of Thailand. Appl. Sci. 9:4846. http://dx.doi.org/10.3390/app9224846.

Sousa, I. C., Nascimento, M., Silva, G. N., Nascimento, A. C. C., Cruz, C. D., Fonseca, F., Almeida, D. P., Pestana, K. N., Azevedo, C. F., Zambolim, L., & Caixeita, E.T. (2020). Genomic prediction of leaf rust resistance to Arabica coffee using machine learning algorithms. Scientia Agricola, 78, 1–8. https://doi.org/10.1590/1678-992x-2020-0021

Sreekanth, S., Ramaswamy, H. S., Sablani, S. S., & Prasher, S. O. (2010). A neural network approach for evaluation of surface heat transfer coefficient. Journal of Food Processing and Preservation, 23, 329-348. https://doi.org/10.1111/j.1745-4549.1999.tb00389.x

Stefaniak, B., Cholewiński, W., & Tarkowska, A. (2005). Algorithms of Artificial Neural Networks - Practical application in medical science. Polski Merkuriusz Lekarski. 19, 819-822.

Tan, K., Li, E., Du, Q., & Du, P. (2014). An efficient semi-supervised classification approach for hyperspectral imagery. ISPRS Journal of Photogrammetry and Remote Sensing, 97, 36–45. http://dx.doi.org/10.1016/j.isprsjprs.2014.08.003

Ventura, R. V., Silva, M. A., Medeiros, T. H., Dionello, N. L., Madalena, F. E., Fridrich, A. B., Valente, B. D., Santos, G. G., Freitas, L. S., Wenceslau, R. R., Felipe, V. P. S., & Corrêa, G. S. S. (2012). Use of artificial neural networks in breeding values prediction for weight at 205 days in Tabapuã beef cattle. Arquivo Brasileiro de Medicina Veterinária e Zootecnia, 64, 411-418. http://dx.doi.org/10.1590/S0102-09352012000200022.

York, T. P., & Eaves, L. J. (2001). Common Disease Analysis Using Multivariate Adaptive Regression Splines (MARS): Genetic Analysis Workshop 12 Simulated Sequence Data. Genetic Epidemiology, 21, S649–S654.

Yu, H., Campbell, M.T., Zhang, Q., Walia, H., & Morota, G. (2019). Genomic Bayesian confirmatory factor analysis and Bayesian network to characterize a wide spectrum of rice phenotypes. G3: Genes, Genomes, Genetics, 9, 1975-1986. http://dx.doi.org/10.1101/435792.

Zheng, G., Yang, P., Zhou, H., Zeng, C., Yang, X., He, X., & Yu, X. (2019). Evaluation of the earthquake induced uplift displacement of tunnels using multivariate adaptive regression splines. Computers and Geotechnics, 113, 103099.

Published
2023-03-08
How to Cite
Silva Júnior, A. C., Moura, W. M., Bhering, L. L., Siqueira, M. J. S., Costa, W. G., Nascimento, M., & Cruz, C. D. (2023). Prediction and importance of predictors in approaches based on computational intelligence and machine learning. Agronomy Science and Biotechnology, 9, 1-24. https://doi.org/10.33158/ASB.r179.v9.2023

Most read articles by the same author(s)