Language-Based Differential Privacy with Accuracy Estimations and Sensitivity Analyses

Elisabet Lobo Vesga

Language-Based Differential Privacy with Accuracy Estimations and Sensitivity Analyses
Doktorsavhandling, 2023

This thesis focuses on the development of programming frameworks to enforce, by construction, desirable properties of software systems. Particularly, we are interested in enforcing differential privacy -- a mathematical notion of data privacy -- while statically reasoning about the accuracy of computations, along with deriving the sensitivity of arbitrary functions to further strengthen the expressiveness of these systems. To this end, we first introduce DPella, a programming framework for differentially-private queries that allows reasoning about the privacy and accuracy of data analyses. DPella provides a novel component that statically tracks the accuracy of different queries. This component leverages taint analysis to infer statistical independence of the different noises that were added to ensure the privacy of the overall computation. As a result, DPella allows analysts to implement privacy-preserving queries and adjust the privacy parameters to meet accuracy targets or vice-versa.

In the context of differentially-private systems, the sensitivity of a function determines the amount of noise needed to achieve a desired level of privacy. However, establishing the sensitivity of arbitrary functions is non-trivial. Consequently, systems such as DPella provided a limited set of functions -- whose sensitivity is known -- to apply over sensitive data, thus hindering the expressiveness of the language. To overcome this limitation, we propose a new approach to derive proofs of sensitivity in programming languages with support for polymorphism. Our approach enriches base types with information about the metric relation between values and applies parametricity to derive proof of a function's sensitivity. These ideas are formalized in a sound calculus and implemented as a Haskell library called Spar, enabling programmers to prove the sensitivity of their functions through type-checking alone.

Overall, this thesis contributes to the development of expressive programming frameworks for data analysis with privacy and accuracy guarantees. The proposed approaches are feasible and effective, as demonstrated through the implementation of DPella and Spar.

haskell

accuracy

parametricity

Program reasoning

Functional Programming

concentration bounds

differential privacy

EDIT-EA Lecture Hall, Rännvägen 6B, Chalmers

Opponent: Danfeng Zhang, Department of Computer Science and Engineering, Pennsylvania State University, United States of America

Online disputation

Författare

Elisabet Lobo Vesga

Chalmers, Data- och informationsteknik, Informationssäkerhet

Forskning Andra publikationer

A Programming Language for Data Privacy with Accuracy Estimations

ACM Transactions on Programming Languages and Systems,;Vol. 43(2021)

Artikel i vetenskaplig tidskrift

Lobo-Vesga, E, Russo, A, Gaboardi, M. Sensitivity by Parametricity: Simple Sensitivity Proofs for Differential Privacy

In today's digital age, vast amounts of personal data are collected by various organizations, including governments, businesses, and social media platforms. This data can be used for different purposes, such as marketing, research, and even surveillance. However, collecting and using personal data can also pose significant risks to individuals' privacy, particularly if the data is mishandled or falls into the wrong hands. For example, data breaches can lead to identity theft, financial fraud, and other crimes that can cause significant harm to individuals. One solution to this problem is differential privacy, which is a technique used to protect the privacy of individuals while still allowing organizations to use their data for research or other purposes. Differential privacy involves adding noise or randomness to the data to obscure the presence of the individuals in the dataset. The main challenge of differential privacy is finding the right balance between privacy and accuracy. Adding too much noise can result in inaccurate results, while too little noise can compromise privacy.

In this thesis, we explore different programming language techniques for designing and deploying expressive differentially-private systems, where data analysts can reason about the privacy-accuracy trade-offs. In particular, we use information-flow control techniques to keep track of various privacy-related aspects of a program's implementation without having to execute them. With this approach, practitioners can determine the privacy-accuracy trade-offs of their analyses before accessing any sensitive data.

Octopi: Säker Programering för Sakernas Internet

Stiftelsen för Strategisk forskning (SSF) (RIT17-0023), 2018-03-01 -- 2023-02-28.

Visa projekt

WebSec: Säkerhet i webb-drivna system

Stiftelsen för Strategisk forskning (SSF) (RIT17-0011), 2018-03-01 -- 2023-02-28.

Visa projekt

Styrkeområden

Informations- och kommunikationsteknik

Ämneskategorier (SSIF 2011)

Data- och informationsvetenskap

ISBN

978-91-7905-811-1

Doktorsavhandlingar vid Chalmers tekniska högskola. Ny serie: 5277

Utgivare

Chalmers