Process Reward Models (PRMs) score individual reasoning steps for correctness and have become central to recent reasoning-model training pipelines. Most published PRMs are math-only and use a single scalar correctness signal; we examine whether multi-dimensional scoring on a domain-specialized reasoning task (DeFi/crypto financial analysis) is worth the extra modeling surface. We describe NPC Fin-PRM 7B, a 7B QLoRA-trained PRM scored on four dimensions factualₐccuracy, logicalᵥalidity, completeness, and riskₐwareness trained in 17. 4 hours on a single H100 from approximately 80, 000 step-level judge labels generated by Qwen2. 5-72B over 4, 866 reasoning trees. The judge is served locally on the same H100 via vLLM. On a stratified 200-example held-out validation split, the model achieves Spearman 0. 92 against judge labels (rating accuracy 88. 5%, error-detection F1 0. 84) at MAE 0. 04 on the 0-1 score scale. Out-of-distribution evaluation on 307 gold-correct math-reasoning steps from GSM8K and MATH-500 finds only 5. 2% mis-flagged as flawed and a mean overallₛcore of 0. 856 — substantially better cross-domain transfer than expected for a DeFi-only training corpus, with a side-effect that the model extrapolates beyond its training labels by emitting EXCELLENT and PERFECT ratings on 3. 9% of OOD math steps despite never being trained to produce them. Two findings stand out. First, three of the four dimensions form a tightly correlated cluster (pairwise Spearman 0. 85-0. 92) that is largely captured by a single axis; the judge's overall score is 95% explained by logicalᵥalidity alone. Second, despite Spearman 0. 92 the model is poorly calibrated as a probability (ECE = 0. 21): it over-flags in the 0. 1-0. 5 score band and under-flags around 0. 5-0. 7, so the score scale should not be used as a calibrated probability without a Platt-scaling or isotonic-regression sidecar. The contribution is recipe-level: a complete pipeline for domain-specialized process reward modeling on accessible hardware, with the limitations and unmet experiments reported alongside the wins.
Rama Krishna Bachu (Sun,) studied this question.