Disclosure risk measurement of anonymized datasets after probabilistic attacks
Authors
Advisors
Issue Date
Type
Keywords
Citation
Abstract
We present a unified metric for analyzing the risk of disclosing anonymized datasets. Datasets containing privacy sensitive information are often required to be shared with unauthorized users for utilization of valuable statistical properties of the data. Anonymizing the actual data provides a great opportunity to share the data while preserving its statistical properties and privacy. The risk of disclosure remains, as hackers may perform a de-anonymization attack to breach the privacy from released datasets. Existing metrics for analyzing this risk were established in the context of infeasibility attacks where each consistent matching (i.e., feasible mapping between actual data and anonymized data) appears equally likely to the hacker. In practice, the hacker may possess some background knowledge for assigning unequal probabilities to all the matchings. We consider these unequal probabilities assigned to matchings to compute the expected closeness of the matchings to the actual mapping adopted for anonymization. We find that our metric delivers a more practical risk assessment for decision makers but has a high computational complexity. Hence, we propose an efficient heuristic for our metric and analyze its accuracy. We also show that our heuristic results in a very close estimation to the actual metric.

