Potential Use of Data-Driven Models to Estimate and Predict Soybean Yields at National Scale in Brazil
Journal article, 2022

Large-scale assessment of crop yields plays a fundamental role for agricultural planning and to achieve food security goals. In this study, we evaluated the robustness of data-driven models for estimating soybean yields at 120 days after sow (DAS) in the main producing regions in Brazil; and evaluated the reliability of the “best” data-driven model as a tool for early prediction of soybean yields for an independent year. Our methodology explicitly describes a general approach for wrapping up publicly available databases and build data-driven models (multiple linear regression—MLR; random forests—RF; and support vector machines—SVM) to predict yields at large scales using gridded data of weather and soil information. We filtered out counties with missing or suspicious yield records, resulting on a crop yield database containing 3450 records (23 years × 150 “high-quality” counties). RF and SVM had similar results for calibration and validation steps, whereas MLR showed the poorest performance. Our analysis revealed a potential use of data-driven models for predict soybean yields at large scales in Brazil with around one month before harvest (i.e. 90 DAS). Using a well-trained RF model for predicting crop yield during a specific year at 90 DAS, the RMSE ranged from 303.9 to 1055.7 kg ha–1 representing a relative error (rRMSE) between 9.2 and 41.5%. Although we showed up robust data-driven models for yield prediction at large scales in Brazil, there are still a room for improving its accuracy. The inclusion of explanatory variables related to crop (e.g. growing degree-days, flowering dates), environment (e.g. remotely-sensed vegetation indices, number of dry and heat days during the cycle) and outputs from process-based crop simulation models (e.g. biomass, leaf area index and plant phenology), are potential strategies to improve model accuracy.

Machine learning approaches

Public databases

Large-scale analysis

Geospatial and temporal variability

Climatic and soil variables

Author

Leonardo A. Monteiro

School of Agricultural Engineering (FEAGRI)

Food and Agriculture Organization of the United Nations

University of Kentucky

Rafael M. Ramos

UNIEURO University Center

Rafael Battisti

Federal University of Goiás

Johnny R. Soares

School of Agricultural Engineering (FEAGRI)

Julianne de Castro Oliveira

Chalmers, Technology Management and Economics, Environmental Systems Analysis

Gleyce K.D.A. Figueiredo

School of Agricultural Engineering (FEAGRI)

Rubens A.C. Lamparelli

Center of Energy Planning (NIPE)

Claas Nendel

Czech Academy of Sciences

University of Potsdam

Leibniz Centre for Agricultural Landscape Research (ZALF)

Marcos Alberto Lana

Swedish University of Agricultural Sciences (SLU)

International Journal of Plant Production

1735-6814 (ISSN) 17358043 (eISSN)

Vol. 16 4 691-703

Subject Categories (SSIF 2011)

Other Computer and Information Science

Bioinformatics (Computational Biology)

Physical Geography

DOI

10.1007/s42106-022-00209-0

More information

Latest update

3/7/2024 9