Follow
Hannah Rose Kirk
Title
Cited by
Cited by
Year
Bias out-of-the-box: An empirical analysis of intersectional occupational biases in popular generative language models
HR Kirk, Y Jun, F Volpin, H Iqbal, E Benussi, F Dreyer, A Shtedritski, ...
Advances in neural information processing systems 34, 2611-2624, 2021
1312021
Auditing large language models: a three-layered approach
J Mökander, J Schuett, HR Kirk, L Floridi
AI and Ethics, 1-31, 2023
1152023
Dataperf: Benchmarks for data-centric ai development
M Mazumder, C Banbury, X Yao, B Karlaš, W Gaviria Rojas, S Diamos, ...
Advances in Neural Information Processing Systems 36, 2024
742024
SemEval-2023 task 10: explainable detection of online sexism
HR Kirk, W Yin, B Vidgen, P Röttger
arXiv preprint arXiv:2303.04222, 2023
732023
A prompt array keeps the bias away: Debiasing vision-language models with adversarial learning
H Berg, SM Hall, Y Bhalgat, W Yang, HR Kirk, A Shtedritski, M Bain
Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the …, 2022
632022
The benefits, risks and bounds of personalizing the alignment of large language models to individuals
HR Kirk, B Vidgen, P Röttger, SA Hale
Nature Machine Intelligence, 1-10, 2024
47*2024
Hatemoji: A test suite and adversarially-generated dataset for benchmarking and detecting emoji-based hate
HR Kirk, B Vidgen, P Röttger, T Thrush, SA Hale
Proceedings of the 2022 Conference of the North American Chapter of the …, 2021
452021
Handling and Presenting Harmful Text in NLP
HR Kirk, A Birhane, B Vidgen, L Derczynski
EMNLP Findings, 2022
31*2022
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements
C Borchers, DS Gala, B Gilburt, E Oravkin, W Bounsi, YM Asano, HR Kirk
Proceedings of the 4th workshop on gender bias in natural language …, 2022
242022
Xstest: A test suite for identifying exaggerated safety behaviours in large language models
P Röttger, HR Kirk, B Vidgen, G Attanasio, F Bianchi, D Hovy
arXiv preprint arXiv:2308.01263, 2023
232023
Memes in the Wild: Assessing the Generalizability of the Hateful Memes Challenge Dataset
HR Kirk, Y Jun, P Rauba, G Wachtel, R Li, X Bai, N Broestl, M Doff-Sotta, ...
Proceedings of the 5th Workshop on Online Abuse and Harms (WOAH 2021), 2021
232021
Assessing language model deployment with risk cards
L Derczynski, HR Kirk, V Balachandran, S Kumar, Y Tsvetkov, MR Leiser, ...
arXiv preprint arXiv:2303.18190, 2023
162023
The past, present and better future of feedback learning in large language models for subjective human preferences and values
HR Kirk, AM Bean, B Vidgen, P Röttger, SA Hale
arXiv preprint arXiv:2310.07629, 2023
112023
Casteist but not racist? quantifying disparities in large language model bias between india and the west
K Khandelwal, M Tonneau, AM Bean, HR Kirk, SA Hale
arXiv preprint arXiv:2309.08573, 2023
112023
The nuances of Confucianism in technology policy: An inquiry into the interaction between cultural and political systems in Chinese digital ethics
HR Kirk, K Lee, C Micallef
International Journal of Politics, Culture, and Society, 1-24, 2020
112020
Balancing the picture: Debiasing vision-language datasets with synthetic contrast sets
B Smith, M Farinha, SM Hall, HR Kirk, A Shtedritski, M Bain
arXiv preprint arXiv:2305.15407, 2023
82023
Is More Data Better? Re-thinking the Importance of Efficiency in Abusive Language Detection with Transformers-Based Active Learning
HR Kirk, B Vidgen, SA Hale
Proceedings of the Third Workshop on Threat, Aggression and Cyberbullying …, 2022
62022
The Empty Signifier Problem: Towards Clearer Paradigms for Operationalising "Alignment" in Large Language Models
HR Kirk, B Vidgen, P Röttger, SA Hale
arXiv preprint arXiv:2310.02457, 2023
42023
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models
P Röttger, V Hofmann, V Pyatkin, M Hinck, HR Kirk, H Schütze, D Hovy
arXiv preprint arXiv:2402.16786, 2024
32024
Adversarial Nibbler: An Open Red-Teaming Method for Identifying Diverse Harms in Text-to-Image Generation
J Quaye, A Parrish, O Inel, C Rastogi, HR Kirk, M Kahng, E van Liemt, ...
arXiv preprint arXiv:2403.12075, 2024
3*2024
The system can't perform the operation now. Try again later.
Articles 1–20