Performances of Machine Learning Algorithms in Predicting the Productivity of Conservation Agriculture at a Global Scale
Artikel i vetenskaplig tidskrift, 2022
Assessing the productive performance of conservation agriculture (CA) has become a major issue due to growing concerns about global food security and sustainability. Numerous experiments have been conducted to assess the performance of CA under various local conditions, and meta-analysis has become a standard approach in agricultural sector for analysing and summarizing the experimental data. Meta-analysis provides valuable synthetic information based on mean effect size estimation. However, summarizing large amounts of information by way of a single mean effect value is not always satisfactory, especially when considering agricultural practices. Indeed, their impacts on crop yields are often non-linear, and vary widely depending on a number of factors, including soil properties and local climate conditions. To address this issue, here we present a machine learning approach to produce data-driven global maps describing the spatial distribution of the productivity of CA versus conventional tillage (CT). Our objective is to evaluate and compare several machine-learning models for their ability in estimating the productivity of CA systems, and to analyse uncertainty in the model outputs. We consider different usages, including classification, point regression and quantile regression. Our approach covers the comparison of 12 different machine learning algorithms, model training, tuning with cross-validation, testing, and global projection of results. The performances of these algorithms are compared based on a recent global dataset including more than 4,000 pairs of crop yield data for CA vs. CT. We show that random forest has the best performance in classification and regression, while quantile regression forest performs better than quantile neural networks in quantile regression. The best algorithms are used to map crop productivity of CA vs. CT at the global scale, and results reveal that the performance of CA vs. CT is characterized by a strong spatial variability, and that the probability of yield gain with CA is highly dependent on geographical locations. This result demonstrates that our approach is much more informative than simply presenting average effect sizes produced by standard meta-analyses, and paves the way for such probabilistic, spatially-explicit approaches in many other fields of research.
conservation agriculture
Algorithm comparison
model performance
crop yield
machine learning