A method for t-close anonymization in the presence of multiple numerical sensitive attributes
Authors
Advisors
Issue Date
Type
Keywords
Citation
Abstract
Privacy Preserving Data Publishing is an area of research focused on developing methods of anonymizing sensitive relational data such that it can be published without compromising the privacy of the individuals the data represents. The attributes contained in the data that need not be immediately discarded are categorized as either quasi-identifying or sensitive. One privacy guarantee that has gained recent popularity, t-closeness, partitions the data into equivalence classes in which the quasi-identifying attributes of the contained records are made indistinguishable from one another. To protect against skewness and similarity attacks, the distribution of sensitive attributes within each equivalence class is guaranteed to be within a given threshold t of the distribution in the whole table. Although most real-world data include multiple sensitive attributes, the majority of existing t-close algorithms are only suitable for data with one sensitive attribute. We present a method for anonymizing relational data with two discrete numerical sensitive attributes such that the privacy parameter t for each can be selected individually. Our method partitions the data into fragments and selects appropriate numbers of records from each fragment to create equivalence classes with sensitive attribute distributions that are guaranteed t-close. Our method can easily be generalized to an arbitrary number of sensitive attributes and to sensitive attributes with continuous domains. While it is NP-hard to find an optimal anonymization, our method finds an acceptable anonymization in polynomial time.