This article proposes a bottleneck based multimodal fusion Transformer network (BMFNet, Bottleneck based Multi modal Fusion Network) for intelligent recognition of welding defects. This method combines synchronously collected molten pool image sequences with arc voltage/current signals, breaking through the limitations of traditional single-mode detection in complex welding scenarios. The welding process is affected by the coupling of multiple physical fields such as heat, force, and electricity, and a single sensor data is difficult to fully reflect the quality of the weld seam. To this end, BMFNet innovatively introduces the MobileNetV3 bottleneck module as a lightweight feature extractor, which significantly reduces the number of model parameters and computational overhead while ensuring representation capability. Furthermore, by integrating the local perception advantages of Convolutional Neural Networks (CNNs) with the global modeling capabilities of Transformers, a multimodal feature interaction mechanism is constructed to effectively alleviate semantic loss in the information fusion process and enhance sensitivity to small defects and dynamic processes. The experimental results show that BMFNet has significantly improved recognition precision, and its inference speed meets the real-time detection requirements of industry, providing an efficient and robust new paradigm for intelligent welding quality monitoring.
Li et al. (Sun,) studied this question.