We present a systematic study of LoRA fine-tuning strategies for IBM Granite 4. 0-H-Micro, a 3. 2B-parameter hybrid architecture comprising 36 Mamba-2 state space layers and 4 Transformer attention layers. We evaluate four fine-tuning approaches against an unmodified baseline across three domain-specific tasks: document classification (24K examples), schema mapping (15K examples), and structured rule generation (9K examples). Our investigation proceeds in two stages. First, we find that co-training LoRA adapters with unfrozen SSM core parameters (Aₗog, D, dtbias) yields consistent improvements across all tasks (V3). However, PEFT's adapter-only serialization — combined with a save-ordering issue in our training script — silently discarded the trained SSM values from the saved PEFT artifact. Second, after fixing the persistence pipeline (V4), we estimate the SSM parameters' direct contribution: an additional 3. 6 percentage point gain on classification (55. 8% vs 52. 2%), confirming that the co-training effect and persistent SSM adaptation are complementary mechanisms. Classification benefits most from persistent SSM changes — a cumulative 37% relative improvement over LoRA-only (V2) — while schema mapping and rule generation gains are driven primarily by the co-training effect alone. We are not aware of prior public results that combine LoRA targeting of Mamba projections with training and persisting SSM core parameters on a publicly available hybrid Mamba-Transformer model.
Ford Cody (Sat,) studied this question.