Papers
arxiv:2411.14982

Large Multi-modal Models Can Interpret Features in Large Multi-modal Models

Published on Nov 22, 2024
ยท Submitted by kcz358 on Nov 25, 2024
Authors:
,
Bo Li ,

Abstract

Recent advances in Large Multimodal Models (LMMs) lead to significant breakthroughs in both academia and industry. One question that arises is how we, as humans, can understand their internal neural representations. This paper takes an initial step towards addressing this question by presenting a versatile framework to identify and interpret the semantics within LMMs. Specifically, 1) we first apply a Sparse Autoencoder(SAE) to disentangle the representations into human understandable features. 2) We then present an automatic interpretation framework to interpreted the open-semantic features learned in SAE by the LMMs themselves. We employ this framework to analyze the LLaVA-NeXT-8B model using the LLaVA-OV-72B model, demonstrating that these features can effectively steer the model's behavior. Our results contribute to a deeper understanding of why LMMs excel in specific tasks, including EQ tests, and illuminate the nature of their mistakes along with potential strategies for their rectification. These findings offer new insights into the internal mechanisms of LMMs and suggest parallels with the cognitive processes of the human brain.

Community

Paper author Paper submitter

For the first time in the multimodal domain, we demonstrate that features learned by Sparse Autoencoders (SAEs) in a smaller Large Multimodal Model (LMM) can be effectively interpreted by a larger LMM. Our work introduces the use of SAEs to analyze the open-semantic features of LMMs, providing the solution for feature interpretation across various model scales.

This research is inspired by Anthropic's remarkable work on applying SAEs to interpret features in large-scale language models. In multimdoal models, we discovered intriguing features that correlate with diverse semantics and can be leveraged to steer model behavior, enabling more precise control and understanding of LMM functionality.

GitHub : https://github.com/EvolvingLMMs-Lab/multimodal-sae

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Hey HF community ๐Ÿ˜Š,

I am preparing a presentation on this paper and I have a question regarding the analogy to sparse coding used to explain monosemantic features in section 2.1. Sparse Auto-encoders for Disentanglement.

The text states the following:

In this context, W2W_2
acts as an overcomplete dictionary [1] for the input data,
with its rows forming the dictionary vectors, and zz serv-
ing as the sparse coefficients corresponding to these vec-
tors.

In the traditional sparse coding literature referenced ([10, 30]), the signal xx is typically reconstructed as xโ‰ˆDฮฑx \approx D \alpha, where the columns of the dictionary DD are the dictionary vectors (atoms), and the sparse coefficients ฮฑ\alpha scale these columns.

Looking at the SAE decoder equation x^=W2โ‹…z\hat{x} = W_2 \cdot z (Eq. 2), the reconstruction x^\hat{x} is also formed as a linear combination of the columns of W2W_2, weighted by the sparse coefficients zz (i.e., x^=โˆ‘iziโ‹…W2[:,i]\hat{x} = \sum_i z_i \cdot W_2[:,i]).

This seems to suggest that, for the synthesis analogy x^=W2โ‹…z\hat{x} = W_2 \cdot z to hold, the columns of W2W_2 (the decoder weight matrix) should be considered the dictionary vectors, rather than the rows.

Could someone please clarify this point? Was the intention for the columns of W2W_2 to be the dictionary vectors in this analogy, or is there perhaps another interpretation (maybe involving the encoder or matrix transposes) that leads to the 'rows' terminology here?

Any help would be greatly appreciated!

ยท
Paper author

Hi, I have also seen your issue in the github repo. For others that are interested in this you can check the issue here https://github.com/EvolvingLMMs-Lab/multimodal-sae/issues/5

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2411.14982 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2411.14982 in a dataset README.md to link it from this page.

Spaces citing this paper 1

Collections including this paper 8