arXiv Analytics

Sign in

arXiv:2107.09234 [cs.LG]AbstractReferencesReviewsResources

Shared Interest: Large-Scale Visual Analysis of Model Behavior by Measuring Human-AI Alignment

Angie Boggust, Benjamin Hoover, Arvind Satyanarayan, Hendrik Strobelt

Published 2021-07-20Version 1

Saliency methods -- techniques to identify the importance of input features on a model's output -- are a common first step in understanding neural network behavior. However, interpreting saliency requires tedious manual inspection to identify and aggregate patterns in model behavior, resulting in ad hoc or cherry-picked analysis. To address these concerns, we present Shared Interest: a set of metrics for comparing saliency with human annotated ground truths. By providing quantitative descriptors, Shared Interest allows ranking, sorting, and aggregation of inputs thereby facilitating large-scale systematic analysis of model behavior. We use Shared Interest to identify eight recurring patterns in model behavior including focusing on a sufficient subset of ground truth features or being distracted by contextual features. Working with representative real-world users, we show how Shared Interest can be used to rapidly develop or lose trust in a model's reliability, uncover issues that are missed in manual analyses, and enable interactive probing of model behavior.

Comments: 14 pages, 8 figures. For more details, see http://shared-interest.csail.mit.edu
Categories: cs.LG
Related articles: Most relevant | Search more
arXiv:2307.00157 [cs.LG] (Published 2023-06-30)
The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems
arXiv:2203.02013 [cs.LG] (Published 2022-03-03)
DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations
arXiv:2411.04430 [cs.LG] (Published 2024-11-07)
Towards Unifying Interpretability and Control: Evaluation via Intervention