Sensemaking in User-Driven Algorithm Auditing: A Case Study on Gender Bias in an Image Captioning Model

People involved

Behnoosh Mohammadzadeh, Jules Françoise, Michèle Gouiffès, and Baptiste Caramiaux.

Abstract

Non-experts increasingly engage in user-driven algorithm auditing, interacting directly with AI systems to probe, document, and reflect on biased behavior. Yet, auditing remains challenging due to model opacity and limited support for navigating and interpreting outputs. This paper explores the design and evaluation of interfaces grounded in the sensemaking framework to support non-experts in auditing gender bias in image captioning. In a between-subjects study, 60 participants audited an image captioning model using one of three interface conditions: a Baseline interface, a Masking Tool for image manipulation, or a Filtering Tool for organizing captions. Our findings show that interface design shaped what participants noticed, how they interpreted model behavior, and supported their hypotheses. The Image Masking Tool enabled fine-grained testing of visual cues and context, while the Text Filtering Tool revealed broader asymmetries in gendered language. We argue that incorporating sensemaking into auditing practices can advance accountability and transparency in machine learning systems

Project description

This paper is part of Behnoosh Mohammadzadeh’s PhD work titled “Construction of Model Behavior by Novices through Interaction with Machine Learning”. The manuscript is available here: https://theses.hal.science/tel-05326181/.

This article won a Best Paper award 🏆 at CHI 2026, given to the top 1% of conference submissions.