A cross-validation-based statistical theory for point processes
Journal article, 2024
Motivated by cross-validation’s general ability to reduce overfitting and mean square error, we develop a cross-validation-based statistical theory for general point processes. It is based on the combination of two novel concepts for general point processes: cross-validation and prediction errors. Our cross-validation approach uses thinning to split a point process/pattern into pairs of training and validation sets, while our prediction errors measure discrepancy between two point processes. The new statistical approach, which may be used to model different distributional characteristics, exploits the prediction errors to measure how well a given model predicts validation sets using associated training sets. Having indicated that our new framework generalizes many existing statistical approaches, we then establish different theoretical properties for it, including large sample properties. We further recognize that non-parametric intensity estimation is an instance of Papangelou conditional intensity estimation, which we exploit to apply our new statistical theory to kernel intensity estimation. Using independent thinning-based cross-validation, we numerically show that the new approach substantially outperforms the state of the art in bandwidth selection. Finally, we carry out intensity estimation for a dataset in forestry (Euclidean domain) and a dataset in neurology (linear network).
Papangelou conditional intensity
kernel intensity estimation
prediction
spatial statistics
thinning