Loading...
Investigating annotator bias in large language models for hate speech detection
Das, Amit ; Zhang, Zheng ; Hasan, Najib ; Sarkar, Souvika ; Jamshidi, Fatemeh ; Bhattacharya, Tathagata ; Rahgouy, Mostafa ; Raychawdhary, Nilanjana ; Feng, Dongji ; Jain, Vinija ... show 5 more
Das, Amit
Zhang, Zheng
Hasan, Najib
Sarkar, Souvika
Jamshidi, Fatemeh
Bhattacharya, Tathagata
Rahgouy, Mostafa
Raychawdhary, Nilanjana
Feng, Dongji
Jain, Vinija
Citations
Altmetric:
Other Names
Location
Time Period
Advisors
Original Date
Digitization Date
Issue Date
2024-10-12
Type
Conference paper
Genre
Keywords
Large language models,Hate speech detection,Annotator bias
Subjects (LCSH)
Citation
Das, A., Zhang, Z., Hasan, N., Sarkar, S., Jamshidi, F., Bhattacharya, T., Rahgouy, M., Raychawdhary, N., Feng, D., Jain, V., Chadha, A., Sandage, M., Pope, L., Dozier, G., & Seals, C. (2024). Investigating annotator bias in large language models for hate speech detection. NeurIPS 2024 Workshop SafeGenAi. https://openreview.net/forum?id=Epo8F2pkXp
Abstract
Data annotation, the practice of assigning descriptive labels to raw data, is pivotal in optimizing the performance of machine learning models. However, it is a resource-intensive process susceptible to biases introduced by annotators. The emergence of sophisticated Large Language Models (LLMs) presents a unique opportunity to modernize and streamline this complex procedure. While existing research extensively evaluates the efficacy of LLMs, as annotators, this paper delves into the biases present in LLMs when annotating hate speech data. Our research contributes to understanding biases in four key categories: gender, race, religion, and disability with four LLMs: GPT-3.5, GPT-4o, Llama-3.1 and Gemma-2. Specifically targeting highly vulnerable groups within these categories, we analyze annotator biases. Furthermore, we conduct a comprehensive examination of potential factors contributing to these biases by scrutinizing the annotated data. We introduce our custom hate speech detection dataset, HateBiasNet, to conduct this research. Additionally, we perform the same experiments on the ETHOS Mollas et al. (2022) dataset also for comparative analysis. This paper serves as a crucial resource, guiding researchers and practitioners in harnessing the potential of LLMs for data annotation, thereby fostering advancements in this critical field.
Table of Contents
Description
Content Warning: This article features hate speech examples that may be disturbing to some readers.
The HateBiasNet dataset is available here: https://github.com/AmitDasRup123/HateBiasNet
The HateBiasNet dataset is available here: https://github.com/AmitDasRup123/HateBiasNet
Publisher
NeurIPS 2024 Workshop SafeGenAi
