arXiv:2411.04430 [cs.LG]AbstractReferencesReviewsResources Classifications Subjects Themes Keywords intervention, unifying interpretability, model behavior, framework maps intermediate latent representations, popular interpretability methods-sparse autoencoders Tags Journal Information Publisher Journal Year Month Volume Number Pages DOI URL Miscellaneous Typesetting Pages Language License Submit Reset