Science Cast

CReHate: Cross-cultural Re-annotation of English Hate Speech Dataset

Nayeon LeeFebruary 26, 2024 11:36am

Views (78)
Comments (0)

Export Citation

Voice is AI-generated

Connected to paperThis paper is a preprint and has not been certified by peer review

CReHate: Cross-cultural Re-annotation of English Hate Speech Dataset

arXivPDF

Authors

Nayeon Lee, Chani Jung, Junho Myung, Jiho Jin, Juho Kim, Alice Oh

Abstract

English datasets predominantly reflect the perspectives of certain nationalities, which can lead to cultural biases in models and datasets. This is particularly problematic in tasks heavily influenced by subjectivity, such as hate speech detection. To delve into how individuals from different countries perceive hate speech, we introduce CReHate, a cross-cultural re-annotation of the sampled SBIC dataset. This dataset includes annotations from five distinct countries: Australia, Singapore, South Africa, the United Kingdom, and the United States. Our thorough statistical analysis highlights significant differences based on nationality, with only 59.4% of the samples achieving consensus among all countries. We also introduce a culturally sensitive hate speech classifier via transfer learning, adept at capturing perspectives of different nationalities. These findings underscore the need to re-evaluate certain aspects of NLP research, especially with regard to the nuanced nature of hate speech in the English language.

TwitterandLinkedIn

0 comments

Add comment

CReHate: Cross-cultural Re-annotation of English Hate Speech Dataset

CReHate: Cross-cultural Re-annotation of English Hate Speech Dataset

AI-powered Paper ChatBeta

AI-powered Paper ChatBeta

0 comments