Key points are not available for this paper at this time.
Existing CNN-based visual malware classification methods are often constrained by inductive bias mismatch: standard isotropic convolution kernels and global pooling operations neglect the inherent structural anisotropy of malware images, and these methods struggle to address the spatial rearrangement of code blocks caused by obfuscation, which we term the “Malware Picasso Problem”. To overcome these limitations, we propose AD-CapsFPN, an end-to-end framework representing a significant step toward spatial reasoning over texture memorization, with a synergistic “Rectification–Fusion–Inference” mechanism. Our approach rectifies anisotropic inductive biases in the feature extraction stage, dynamically aggregates cross-scale discriminative features in intermediate layers, injects row-aware spatial biases, and adopts a global pooling-free spatial routing strategy in the classification stage, effectively reconstructing logical associations between obfuscated and scattered code blocks. Experiments on the large-scale Fusion dataset and the obfuscated Androdex dataset demonstrate significant performance improvements: our method achieves a 16.22% boost in macro F1-score over the MobileNetV4 baseline on the Fusion dataset (reaching 97.98%), and hits 92.45% macro F1-score on the highly challenging Androdex-Set1, outperforming state-of-the-art methods such as MDC-RepNet (88.97%) and TAEfficientNet (88.15%). This work confirms that embedding malware domain priors into architecture design is the key to robust malware classification.
Wang et al. (Fri,) studied this question.