2024 article
In-season Sweetpotato Yield Forecasting using Multitemporal Remote Sensing Environmental Observations and Machine Learning
Carbajal-Carrasco, M., Jones, D., Williams, C., & Nelson, N. (2024, April 26).
Data-driven modeling approaches for crop yield prediction have exponentially increased in the last decade due to the greater availability of spatial data from various sensors. Yet, most yield modeling has focused on major commodities, leaving lesser-cultivated horticultural crops like sweetpotato relatively undertooled, though these crops considerably contribute to the global economy and food supply. The U.S. is the primary exporter of sweetpotato (271 K tonnes), with 21% of U.S.-grown sweetpotatoes being exported. Early yield forecasting at the county scale offers crucial insights for growers, packers, wholesalers, and associated industries, enabling them to anticipate variations in yield to make informed decisions. While roots and tubers have demonstrated a relationship between yields and above-ground plant characteristics, it remains uncertain whether forecasting models that utilize remotely sensed data, including vegetation indices, are suitable for sweetpotato. We developed county-scale in-season sweetpotato yield forecast models using machine learning (ML) algorithms and multitemporal remote sensing environmental data. Four of the most commonly used ML algorithms for predicting crop yield-Random Forest Regression (RFR), Artificial Neural Networks (ANN), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGB)-were applied using stationary (topography and soil characteristics), and temporal (weather, NDVI, and Growing Degree Days) variables as potential predictors. Six predictor sets were tested to identify key predictor variables, optimal aggregation time (16 or 32 days composite) of the temporal variables, and how early in the growing season the models can reliably predict end-of-season yields. U.S. Annual CropScape land cover layers were used to identify sweetpotato fields, over which temporal variables were aggregated, and sweetpotato yields were tabulated from the USDA Agricultural Survey from 2008 to 2022. The Boruta method was used for feature selection across each predictor set before training the ML models. RFR outperformed other ML algorithms and the RFR models' evaluation metrics were the most consistent across the six predictor sets. The RFR model that incorporated early and mid season temporal variables as 16-day composites was selected and proposed for future sweetpotato yield forecasting due to its performance (R 2 = 0.44, RMSE = 3.53 tonnes.ha-1), as well as ability to predict early enough 1 in the season to provide actionable information. In the final model, several stationary variables (elevation, nitrogen, cec, soc, and clay content) were the most predictive of sweetpotato yield. After these stationary variables, NDVI and precipitation from the time around storage root initiation and bulking (July), and minimum temperature around planting (June) followed in importance.