Abstract Multimodal industrial anomaly detection (IAD), which integrates RGB and 3D information, has become one of the key technical directions for improving detection robustness and accuracy.Although prevailing cross-modal feature-mapping methods are efficient and lightweight, they still suffer from two major limitations. First, they typically adopt a one-way modeling paradigm that regresses one modality from another and lack explicit interaction within a unified representation space, making it difficult to detect local, small-magnitude anomalies that appear only in a single modality.Second, fusion-reconstruction methods derived from this paradigm rely on a single fusion stream optimized with a reconstruction loss. When trained solely on normal samples, this design can overgeneralize and lacks a parallel branch to enforce consistency constraints on the fused representations, which in turn limits reliable discrimination between normal and anomalous patterns in complex multimodal scenarios. To address these issues, we propose FMFR, a feature-level multistage fusion and remapping framework that jointly models multistage feature fusion and cross-modal remapping. The framework consists of a fusion-reconstruction branch and a remapping-fusion branch, which are jointly constrained by a multi-order consistency loss. In the fusion-reconstruction branch, a reconstruction loss supervises the intermediate fusion layers, encouraging them to learn joint representations that retain complete information and to reconstruct features without losing critical details. In the remapping-fusion branch, the network learns bidirectional mappings between modalities and re-fuses the remapped features, while the multi-order consistency loss is used to align its fused representations with those of the fusion-reconstruction branch. During inference, FMFR jointly leverages intra-modal reconstruction residuals, cross-modal remapping residuals, and the consistency deviation between the fused embeddings of the two branches to construct multi-source anomaly maps. This design forces anomalies to simultaneously violate both intra-modal and cross-modal priors, thereby suppressing the overgeneralization of a single fusion stream and enhancing the visibility of local anomaly structures that exist only in a single modality as well as the overall robustness of anomaly detection. Experimental results on the MVTec 3D-AD dataset demonstrate that FMFR achieves competitive state-of-the-art performance on both anomaly detection and anomaly segmentation tasks.
Building similarity graph...
Analyzing shared references across papers
Loading...
Chen Wang
Southwest University
Heng Zhang
Journal of Computational Design and Engineering
Southwest University
Southwest University of Political Science & Law
Building similarity graph...
Analyzing shared references across papers
Loading...
Wang et al. (Fri,) studied this question.
synapsesocial.com/papers/69a52e04f1e85e5c73bf15c7 — DOI: https://doi.org/10.1093/jcde/qwag016