What type of study is this?

This is a Quantitative Study study.

September 29, 2025Open Access

Multi-Scale Probabilistic Generation Theory: A Hierarchical Framework for Interpreting Large Language Models

Key Points

MSPGT provides a structured approach to understanding text generation across models.
Attention span thresholds and mutual information metrics help define semantic scale boundaries.
Local scale adjustments increase lexical diversity, while global changes enhance discourse coherence.
Decoder-only models use more layers for advanced processing compared to encoder-only architectures.

Abstract

Large Transformer based language models achieve remarkable performance but remain opaque in how they plan, structure, and realize text. We introduce MultiScale Probabilistic Generation Theory (MSPGT), a hierarchical framework that factorizes generation into three semantic scalesglobal context, intermediate structure, and local word choices and aligns each scale with specific layer ranges in Transformer architectures. To identify scale boundaries, we propose two complementary metrics: attention span thresholds and inter layer mutual information peaks. Across four representative models (GPT-2, BERT, RoBERTa, and T5), these metrics yield stable local/intermediate/global partitions, corroborated by probing tasks and causal interventions. We find that decoderₒnly models allocate more layers to intermediate and global processing while encoderₒnly models emphasize local feature extraction. Through targeted interventions, we demonstrate that local scale manipulations primarily influence lexical diversity, intermediate-scale modifications affect sentence structure and length, and globalₛcale perturbations impact discourse coherence all with statistically significant effects. MSPGT thus offers a unified, architecture-agnostic method for interpreting, diagnosing, and controlling large language models, bridging the gap between mechanistic interpretability and emergent capabilities.

Read Full Paperexternally

اسأل الذكاء الاصطناعي

Bookmark

View Full Paper