What question did this study set out to answer?

The aim is to provide an explainability framework for multimodal large language models using Shapley values.

April 23, 2026Open Access

mllm-shap: A Shapley Value Explainability Platform for Text-Audio Multimodal Large Language Models

Key Points

The aim is to provide an explainability framework for multimodal large language models using Shapley values.
Developed an open-source Python platform called mllm-shap for explainability in multimodal LLMs.
Implemented modality-aware coalition masking and multi-turn conversation tracking.
Created five estimation strategies for Shapley values including a Complementary Contributions estimator.
mllm-shap reduces coalition space complexity by 10–50 times.
Achieved better performance than Monte Carlo baselines with optimal allocation techniques.
Provides a real-time visualization GUI for interpreting model attributions.

Abstract

We present mllm-shap, an open-source Python platform for researchers and ML practitioners that extends Shapley value (SV) explainability from text-only large language models to multimodal LLMs (MLLMs) that jointly process text and audio. Building on the token-level SV framework introduced by TokenSHAP, mllm-shap addresses three challenges absent in the text-only setting: (1) modality-aware coalition masking that handles the coexistence of text tokens and dense audio encoder frames within a single input, (2) multi-turn conversation tracking with per-token role and modality metadata, and (3) audio token grouping via phonetic alignment that reduces the coalition space by 10–50×. The platform ships as a pip-installable package implementing five SV estimation strategies – including a Complementary Contributions estimator with Neyman-optimal allocation that outperforms Monte Carlo baselines – together with an interactive web GUI for real-time attribution visualization. To our knowledge, mllm-shap is the first publicly available framework for complete, reproducible SV-based explainability of text-audio MLLMs. The package is MIT-licensed with full source code on GitHub and a demonstration video included as supplementary material.

mllm-shap: A Shapley Value Explainability Platform for Text-Audio Multimodal Large Language Models

Key Points

Abstract

Cite This Study