What type of study is this?

This is a Quantitative Study study.

September 20, 2025

Interpreting Pretrained Language Models via Concept Bottlenecks (Extended Abstract)

Key Points

The proposed framework enhances interpretability by linking pretrained language models to human-understandable concepts, facilitating easier comprehension.
C3M provides robust training by combining human-annotated and machine-generated concepts through a unique mixup mechanism.
Empirical results indicate that the method improves the interpretability-utility trade-off, even when using limited or noisy concept annotations.
The framework utilizes large language models like ChatGPT to augment concept sets, aiding in intuitive explanations and model diagnostics.

Abstract

Pretrained language models (PLMs) achieve state-of-the-art results but often function as ``black boxes'', hindering interpretability and responsible deployment. While methods like attention analysis exist, they often lack clarity and intuitiveness. We propose interpreting PLMs through high-level, human-understandable concepts using Concept Bottleneck Models (CBMs). This extended abstract introduces C3M (ChatGPT-guided Concept augmentation with Concept-level Mixup), a novel framework for training Concept-Bottleneck-Enabled PLMs (CBE-PLMs). C3M leverages Large Language Models (LLMs) like ChatGPT to augment concept sets and generate noisy concept labels, combined with a concept-level MixUp mechanism to enhance robustness and effectively learn from both human-annotated and machine-generated concepts. Empirical results show our approach provides intuitive explanations, aids model diagnosis via test-time intervention, and improves the interpretability-utility trade-off, even with limited or noisy concept annotations. This is an concise version of Tan et al. , 2024b, recipient of the Best Paper Award at PAKDD 2024. Code and data are released at https: //github. com/Zhen-Tan-dmml/CBMNLP. git.

Ask AI

Helpful

Bookmark