We describe NPC Fin 32B, a 32B-parameter financial-reasoning model fine-tuned from Qwen2.5-32B-Instruct via QLoRA on 32,496 supervised examples (59.7M tokens) drawn from five domain tags: crypto-signal analysis, broad crypto knowledge, multi-path logic-tree reasoning, equities and macroeconomic analysis, and cross-asset correlation. Training labels were generated synthetically from Qwen2.5-72B-Instruct over signals exported from a production MongoDB. The model achieves 93.6% on a 500-question internal financial-reasoning benchmark. The training run used DeepSpeed ZeRO-3 with full CPU offload across 12 NVIDIA H100 SXM5 80 GB GPUs for approximately 72 hours of wall-clock time, totalling 864 H100-hours. We document this as a recipe for 32B-scale domain-specialized supervised fine-tuning that fits the engineering surface of a small lab: a single multi-GPU node, open-weight base model, and synthetically-generated training labels. The paper's central honest observation is a config-vs-runtime drift that is invisible from the published model card alone. The training YAML inherited from an earlier single-GPU plan declared an effective batch size of 32 (per-device 4 × grad-accum 8), but the realized run, distributed across 12 GPUs under DeepSpeed ZeRO-3, scaled the global effective batch to approximately 384. The optimizer's peak learning rate of 2e-4 was tuned for the planned batch and was not retuned for the realized 12× scale-up; standard scaling rules would have suggested a peak nearer 7e-4. We document the discrepancy, discuss why the under-scaled LR did not destabilize training, and treat it as a real limitation of the recipe. The contribution is recipe-level: a documented, reproducible pipeline for a domain-specialized 32B reasoner on accessible multi-GPU hardware, with the config-drift bug and other unmet experiments reported alongside the wins.
Rama Krishna Bachu (Mon,) studied this question.