The influence of dataset characteristics on privacy preserving methods in the Advanced Metering Infrastructure
Journal article, 2018
The computing and communication devices employed by the cyber-physical IoT-enabled systems generate large quantities of data. These data offer new possibilities but also raise a number of challenges, especially through their social implications. One of these challenges is preserving the privacy of the individuals whose behavior generates the data in question. Studying how the characteristics of these large datasets may influence the efficiency of different privacy enhancing methods is important. Stakeholders can then better understand the properties of their datasets and the conditions under which such datasets can be released to third parties.
In this paper we study the effect of Advanced Metering Infrastructure (AMI) dataset characteristics on privacy preserving solutions previously proposed in the literature. We focus on common characteristics (data granularity, retention time and use of pseudonyms) and we study their effect on two privacy violations: de-anonymization and de-pseudonymization. In order to better understand their effect, we study the capabilities of the adversary through its modeling and description by a probabilistic framework.
We perform evaluations on a large dataset collected from a real AMI environment. Our results show that simple changes in the data collection procedure can help mitigate the outcome of these privacy violations.