What question did this study set out to answer?

The aim is to evaluate LoRA fine-tuning strategies for improving performance in hybrid Mamba-Transformer models.

March 3, 2026Open Access

SSM-Aware Fine-Tuning for Hybrid Mamba-Transformer Models: A Comparative Study on Granite 4.0-H-Micro

Key Points

The aim is to evaluate LoRA fine-tuning strategies for improving performance in hybrid Mamba-Transformer models.
Systematic comparison of four fine-tuning strategies against an unmodified baseline.
Evaluation across three tasks: document classification, schema mapping, and structured rule generation.
Co-training LoRA adapters with unfrozen SSM core parameters to assess improvements.
Co-training improves performance across all tasks with a 3.6 percentage point gain in classification accuracy.
SSM parameter persistence leads to a cumulative 37% improvement over LoRA-only models in classification.
Schema mapping and rule generation improvements primarily stem from the co-training effect.

Abstract

We present a systematic study of LoRA fine-tuning strategies for IBM Granite 4. 0-H-Micro, a 3. 2B-parameter hybrid architecture comprising 36 Mamba-2 state space layers and 4 Transformer attention layers. We evaluate four fine-tuning approaches against an unmodified baseline across three domain-specific tasks: document classification (24K examples), schema mapping (15K examples), and structured rule generation (9K examples). Our investigation proceeds in two stages. First, we find that co-training LoRA adapters with unfrozen SSM core parameters (Aₗog, D, dtbias) yields consistent improvements across all tasks (V3). However, PEFT's adapter-only serialization — combined with a save-ordering issue in our training script — silently discarded the trained SSM values from the saved PEFT artifact. Second, after fixing the persistence pipeline (V4), we estimate the SSM parameters' direct contribution: an additional 3. 6 percentage point gain on classification (55. 8% vs 52. 2%), confirming that the co-training effect and persistent SSM adaptation are complementary mechanisms. Classification benefits most from persistent SSM changes — a cumulative 37% relative improvement over LoRA-only (V2) — while schema mapping and rule generation gains are driven primarily by the co-training effect alone. We are not aware of prior public results that combine LoRA targeting of Mamba projections with training and persisting SSM core parameters on a publicly available hybrid Mamba-Transformer model.

SSM-Aware Fine-Tuning for Hybrid Mamba-Transformer Models: A Comparative Study on Granite 4.0-H-Micro

Key Points

Abstract

Cite This Study