arXiv:2107.09234 Abstract | arXiv Analytics

arXiv:2107.09234 [cs.LG]Abstract References Reviews Resources

Shared Interest: Large-Scale Visual Analysis of Model Behavior by Measuring Human-AI Alignment

Angie Boggust, Benjamin Hoover, Arvind Satyanarayan, Hendrik Strobelt

Published 2021-07-20Version 1

Saliency methods -- techniques to identify the importance of input features on a model's output -- are a common first step in understanding neural network behavior. However, interpreting saliency requires tedious manual inspection to identify and aggregate patterns in model behavior, resulting in ad hoc or cherry-picked analysis. To address these concerns, we present Shared Interest: a set of metrics for comparing saliency with human annotated ground truths. By providing quantitative descriptors, Shared Interest allows ranking, sorting, and aggregation of inputs thereby facilitating large-scale systematic analysis of model behavior. We use Shared Interest to identify eight recurring patterns in model behavior including focusing on a sufficient subset of ground truth features or being distracted by contextual features. Working with representative real-world users, we show how Shared Interest can be used to rapidly develop or lose trust in a model's reliability, uncover issues that are missed in manual analyses, and enable interactive probing of model behavior.

Comments: 14 pages, 8 figures. For more details, see http://shared-interest.csail.mit.edu

Categories: cs.LG

Keywords: model behavior, shared interest, large-scale visual analysis, measuring human-ai alignment, common first step

Related articles: Most relevant | Search more

arXiv:2307.00157 [cs.LG] (Published 2023-06-30)

The Effect of Balancing Methods on Model Behavior in Imbalanced Classification Problems

Adrian Stando, Mustafa Cavus, Przemysław Biecek

arXiv:2203.02013 [cs.LG] (Published 2022-03-03)

DIME: Fine-grained Interpretations of Multimodal Models via Disentangled Local Explanations

Yiwei Lyu, Paul Pu Liang, Zihao Deng, Ruslan Salakhutdinov, Louis-Philippe Morency

arXiv:2411.04430 [cs.LG] (Published 2024-11-07)

Towards Unifying Interpretability and Control: Evaluation via Intervention

Usha Bhalla, Suraj Srinivas, Asma Ghandeharioun, Himabindu Lakkaraju

arXiv Analytics

arXiv:2107.09234 [cs.LG]Abstract References Reviews Resources

Shared Interest: Large-Scale Visual Analysis of Model Behavior by Measuring Human-AI Alignment

Links

Toolbox

arXiv:2107.09234 [cs.LG]AbstractReferencesReviewsResources

Shared Interest: Large-Scale Visual Analysis of Model Behavior by Measuring Human-AI Alignment

Links

Toolbox

arXiv:2107.09234 [cs.LG]Abstract References Reviews Resources