2023 journal article

Machine learning and data mining methodology to predict nominal and numeric performance body weight values using Large White male turkey datasets


author keywords: turkey production; machine learning; data mining; turkey performance
UN Sustainable Development Goal Categories
Source: Web Of Science
Added: October 16, 2023

Large biological data sets with many variables and a small number of biological replicates ("omics" sciences and industry data) are challenging to analyze with traditional inferential statistics. Statistical models can be applied to data containing more observations than variables, and they are strongly suited for this purpose. However, the power to detect actual differences is reduced when the number of comparisons exceeds the number of experimental replicates or observations. Machine learning (ML) allows researchers to evaluate treatments groups or multiple categories of variables with fewer observations. Thus, it has become a tool used to predict phenomena and evaluate relationships within datasets that are less suited for traditional statistics. Data mining (DM) helps researchers to identify the most critical variables in an ML predictive model and can be used akin to "statistical significance" for interpretation. This current effort aimed to develop ML and DM methodologies while applying them to predict Large White male turkey body weight (BW). Data from a previously reported study were used. Bird BW, weekly BW gain (BWG), feed intake (FI), feed conversion ratio (FCR), small intestine pH, cloacal temperature, density, microbiome taxa, litter content of Mn and Zn, were used as variables for the ML analysis. A total of 253 variables were used in ML and DM analysis. BW and FI at 18 wk were classified as low, objective, and high based on a 5% for BW and 3% for FI margin of the Aviagen male turkey objectives for ML analysis. The WEKA 3.8.5 Experimenter tool used various classification and regression algorithms with a 10-fold cross-validation system to predict 18 wk BW based on input data. A single algorithm made the most practical model, from 3 models constructed, with a correlation of 0.73 and a root square error of 0.26 based only on turkey 14 wk BW. In conclusion, these ML and DM tools could be applied to turkey research and production systems by analyzing large data sets to predict growth performance.