SoC Research Publications

Permanent URI for this collection


Recent Submissions

Now showing 1 - 5 of 64
  • Item
    Applying large language model artificial intelligence for retina International Classification of Diseases (ICD) coding
    (AME Publishing Company, 2023-10) Ong, Joshua; Kedia, Nikita; Harihar, Sanjana; Vupparaboina, Sharat Chandra; Singh, Sumit Randhir; Venkatesh, Ramesh; Vupparaboina, Kiran; Bollepalli, Sandeep Chandra; Chhablani, Jay
    Background: Large language models (LLMs) such as ChatGPT have emerged as a potentially powerful application in medicine. One of these strengths is the ability for ChatGPT to analyze text and to perform certain tasks. International Classification of Diseases (ICD) codes are universally utilized in medicine and have served as a uniform platform for insurance and billing. However, the task of coding ICDs after each patient encounter is time-consuming on physicians, particularly in fast paced clinics such as retina clinics. Additionally, searching for the most specific, correct ICD code may add additional time, resulting in providers electing for more general ICD codes. LLMs may help to relieve this burden by analyzing notes written by a provider and automatically generate an ICD code that can be used for the encounter. Methods: In this study, we analyze the ability of ChatGPT to analyze retina encounters and to generate ICD codes for the encounter without any feedback. Text of mockup retina clinic encounters of various types of visits including new patient visits, return visits, post-operative visits, and injection-only visits were generated by three retina specialists. Results: A total of 181 retina encounters were evaluated, with 84 eyes as right eyes, 97 eyes as left eyes. A total of 597 ICD codes were generated, with 305 consisting of retina codes (1.68 retina codes per eye). In total, 127/181 (70%) of responses resulted in a true positive result with at least one code provided providing a correct code for the encounter that a retina specialist could choose from. In total, 54/181 (30%) responses did not generate any correct code from the text. Further analysis showed that ChatGPT generated a response that was completely correct in 106 of 181 encounters (59%) with the remaining 75 encounters (41%) having some form of incorrect ICD even if it included the correct diagnosis. Conclusions: This pilot study showcases ChatGPT's ability without feedback to potentially alleviate physician burden in coding ICDs by generating a selection of codes. Further research is required to validate this potential use of LLMs to reduce burden on providers through ICD code generation.
  • Item
    Deep Generative Views to Mitigate Gender Classification Bias Across Gender-Race Groups
    (Springer Science and Business Media Deutschland GmbH, 2023) Ramachandran, Sreeraj; Rattani, Ajita
    Published studies have suggested the bias of automated face-based gender classification algorithms across gender-race groups. Specifically, unequal accuracy rates were obtained for women and dark-skinned people. To mitigate the bias of gender classifiers, the vision community has developed several strategies. However, the efficacy of these mitigation strategies is demonstrated for a limited number of races mostly, Caucasian and African-American. Further, these strategies often offer a trade-off between bias and classification accuracy. To further advance the state-of-the-art, we leverage the power of generative views, structured learning, and evidential learning towards mitigating gender classification bias. We demonstrate the superiority of our bias mitigation strategy in improving classification accuracy and reducing bias across gender-racial groups through extensive experimental validation, resulting in state-of-the-art performance in intra- and cross dataset evaluations.
  • Item
    GBDF: Gender Balanced DeepFake Dataset Towards Fair DeepFake Detection
    (Springer Science and Business Media Deutschland GmbH, 2023) Nadimpalli, Aakash Varma; Rattani, Ajita
    Facial forgery by deepfakes has raised severe societal concerns. Several solutions have been proposed by the vision community to effectively combat the misinformation on the internet via automated deepfake detection systems. Recent studies have demonstrated that facial analysis-based deep learning models can discriminate based on protected attributes. For the commercial adoption and massive roll-out of the deepfake detection technology, it is vital to evaluate and understand the fairness (the absence of any prejudice or favoritism) of deepfake detectors across demographic variations such as gender and race. As the performance differential of deepfake detectors between demographic sub-groups would impact millions of people of the deprived sub-group. This paper aims to evaluate the fairness of the deepfake detectors across males and females. However, existing deepfake datasets are not annotated with demographic labels to facilitate fairness analysis. To this aim, we manually annotated existing popular deepfake datasets with gender labels and evaluated the performance differential of current deepfake detectors across gender. Our analysis on the gender-labeled version of the datasets suggests (a) current deepfake datasets have skewed distribution across gender, and (b) commonly adopted deepfake detectors obtain unequal performance across gender with mostly males outperforming females. Finally, we contributed a gender-balanced and annotated deepfake dataset, GBDF, to mitigate the performance differential and to promote research and development towards fairness-aware deep fake detectors. The GBDF dataset is publicly available at:
  • Item
    Generating t-Closed Partitions of Datasets with Multiple Sensitive Attributes
    (Institute of Electrical and Electronics Engineers Inc., 2023-04) Gowda, Vikas Thammanna; Bagai, Rajiv
    The popular t-closeness privacy model requires the "distance" between the distribution of sensitive attribute values in any given raw dataset and their distribution in every equivalence class created to not exceed some privacy threshold t. While most existing methods for achieving t-closeness handle data with just a single sensitive attribute, datasets with multiple sensitive attributes are very common in the real world. Here we demonstrate a technique for creating equivalence classes from a dataset containing multiple sensitive attributes. The equivalence classes generated by our method satisfy t-closeness without taking any $t$ values as input. While generalization of quasi-identifier attributes leads to information loss, the size of generated classes is roughly identical and differs by at most one, which results in a lower information loss. Generating classes with minimum information loss for a given value of $t$ is NP-hard, the equivalence classes generated by our method takes O(r log r) time.
  • Item
    Measuring Economic Benefits of Built Environment Accessibility Technologies for People with Disabilities
    (NLM (Medline), 2023-08) Joseph, Siny; Namboodiri, Vinod
    Given the challenges of wayfinding in large indoor built environments, especially for persons with disabilities (PWDs), a new class of accessible technologies called built environment accessible technologies (BEAT) are being developed. Such technologies are envisioned to help achieve product and opportunity parity for PWDs. The impact and adoption of these BEATs depends largely on clear and quantifiable (tangible and intangible) economic benefits accrued to the end-users and stakeholders. This paper describes the results of a survey conducted to measure potential benefits in terms of quality of life and quality of work life (work productivity) by increased accessibility provisions within built environments as it relates to navigation for PWDs and those without disabilities. Results of this work indicate that BEATs have the greatest potential to improve mobility and exploratory activities for people with disabilities, exploratory activities for people without disabilities, and improve job security for everyone.