Methods to achieve t-closeness for privacy preserving data publishing
Authors
Advisors
Issue Date
Type
Keywords
Citation
Abstract
Privacy Preserving Data Publishing is an area of research focused on developing methods of anonymizing sensitive relational data such that it can be published without compromising the privacy of the individuals the data represents. The t-closeness technique is one of the most popular techniques for preserving individual privacy in data. It involves generalizing and suppressing some attributes of a given table, after partitioning the set of all records of that table into equivalence classes that satisfy a certain constraint. We present three methods for anonymizing datasets addressing the drawbacks of the existing methods. We present a new method to partition the set of records of a table into such equivalence classes. The rst method has several advantages over the existing methods for this task. The classes generated by our method are near-optimal, in that they satisfy the t-closeness constraint for even the \smallest" t value for which t-closeness is achievable and useful for the given table, thereby providing the highest amount of privacy. The second method anonymizes data with multiple sensitive attributes such that the privacy parameter t for each can be selected individually. Our method partitions the data into fragments and selects appropriate numbers of records from each fragment to create equivalence classes with sensitive attribute distributions that are guaranteed t-close. Our method can easily be generalized to an arbitrary number of sensitive attributes and to sensitive attributes with continuous domains. In the third method we present an algorithm for generating equivalence classes in the presence of multiple sensitive attributes. The equivalence classes generated by our method satisfy t-closeness for even the smallest t value for which t-closeness is achievable and useful for the given dataset, thereby providing the highest possible amount of privacy.