Characterizing Spatial Variability of Soil Organic Carbon through Improved Machine Learning Modeling with In-situ Data Resampling: A Case Study in Alaska
Journal article, 2025
Sparse and unevenly distributed soil samples across the northern high-latitude region greatly limit the accuracy of soil organic carbon (SOC) mapping. Therefore, substantial discrepancies exist in SOC estimation in this region, which makes it challenging to characterize the SOC spatial variability and its potential responses to climate change and permafrost degradation. To address these challenges, we enhanced a machine learning model for SOC mapping by developing a data resampling approach that accounts for soil samples spatial heterogeneity, using Alaska as a case study. Specifically, in-situ SOC data were resampled with weights proportional to the variance within a 15-km radius, and then fitted using a random forest (RF) regression model. Multiple features, including temporal composites of Sentinel-1 C-band radar backscatter, vegetation indices from Sentinel-2, climate indices including thawing and freezing indices from moderate resolution imaging spectroradiometer (MODIS), and ancillary topography data, were selected as inputs for the RF model after recursive feature elimination to generate top-layer (0-30 cm) SOC content maps in Alaska at a 250-m resolution. The enhanced RF model with data resampling showed improved accuracy compared to the original RF model, with the coefficient of determination (R2) increased from 0.36 to 0.56 and the root mean square error (RMSE) decreased from 16% to 11% for the surface (0-10 cm) SOC content, and slightly improved accuracy for the deeper (10-30 cm) SOC content. Additionally, the enhanced RF model also better captured local-scale variability of SOC than the original RF model and SoilGrids 2.0 dataset, with high-resolution remote sensing indices playing a major role. The improved SOC content estimates were then used to estimate soil bulk density and calculate total SOC stock for Alaska. Our results suggest that Alaskan topsoil (0-30 cm) stores approximately 25.21±17.18 Pg C, with the largest SOC reserves found in shrublands. These findings highlight the importance of accounting for spatial heterogeneity in in-situ samples and leveraging high-resolution remote sensing data for regional soil mapping.
machine learning
multi-source remote sensing
soil organic carbon
Data resampling