A method for generating near optimal t-closed equivalence classes
Vitalapura, Spandana Siddaramanagowd
MetadataShow full item record
A huge volume of data emanates from various digital sources, this data having sensitive information is being stored and released for purposes that serve the common good through the advancement of knowledge. Organizations often need or want to publish a subset of the sensitive data they collect for regulatory or research purposes, any kind of misuse of this information creates a critical threat to ones' individuality, and to save the privacy of published data we have a research area titled as Privacy Preserving Data Publishing. Privacy Preserving Data Publishing is focused on developing methods of anonymiz- ing sensitive data such that it can be published without compromising the privacy of the individuals the data represents. One privacy guarantee that has gained recent popular- ity, t-closeness, partitions the data into equivalence classes in which the quasi-identifying attributes of the contained records are made indistinguishable from one another and the distribution of sensitive attributes within each equivalence class is guaranteed to be within a given threshold t of the distribution in the whole table. In this thesis, we present a method to achieve t-closeness for a single sensitive at- tribute, which yields us equisized equivalence classes having uniformly distributed sensitive attribute values and each of the equivalence classes satis es a lower t value. The rst step of our algorithm is forming a frequency distribution table from the input data where each sensitive attribute value is arranged in descending order of their frequency. The second step is stacking and dealing of records, here stacked records in the frequency distribution table are cyclically dealt to each equivalence classes. The third step is nding the distribution of sensitive attributes in each equivalence class and using earth movers distance to nd t value for each equivalence class. Compared to other existing methods to achieve t closeness, this method generates equisized equivalence classes having uniformly distributed sensitive attribute values which also takes care of minimal data loss and great data utility.
Thesis (M.S.)-- Wichita State University, College of Engineering, Dept. of Electrical Engineering and Computer Science