A study on data de-pseudonymization in the smart grid
Paper in proceeding, 2015
In the transition to the smart grid, the electricity networks are becoming more data intensive with more data producing devices deployed, increasing both the opportunities and challenges in how the collected data are used. For example, in the Advanced Metering Infrastructure (AMI) the devices and their corresponding data give more information about the operational parameters of the environment but also details about the habits of the people living in the houses monitored by smart meters. Different anonymization techniques have been proposed to minimize privacy concerns, among them the use of pseudonyms. In this work we return to the question of the effectiveness of pseudonyms, by investigating how a previously reported methodology for de-pseudonymization performs given a more realistic and larger dataset than was previously used. We also propose and compare the results with our own simpler de-pseudonymization methodology.
Our results indicate, not surprisingly, that large realistic datasets are very important to properly understand how an experimental method performs. Results based on small datasets run the risk of not being generalizable. In particular, we show that the number of re-identified households by breaking pseudonyms is dependent on the size of the dataset and the period where the pseudonyms are constant and not changed. In the setting of the smart grid, results will even vary based on the season when the dataset was captured. Knowing that relative simple changes in the data collection procedure may significantly increase the resistance to de-anonymization attacks will help future AMI deployments.