Deep learning is widely used in vulnerability detection due to its high accuracy. However, existing models often fail to capture both token-level and function-level features. To address this limitation, a BERT-based Multi-Granularity Attention Network (BMGANet) is proposed. In the BMGANet model, Program Dependence Graphs (PDGs) are first constructed using the Joern tool, and Abstract Syntax Trees (ASTs) are extracted according to predefined vulnerability rules. Cross-user-defined-function program slicing and code normalization are then applied to enhance analysis efficiency. Processed code slices are fed into a BERT network to extract initial token-level and function-level features. To overcome BERT’s limitation in modeling temporal dependencies, an LSTM network and a multi-head attention mechanism are sequentially employed to refine token-level features. The refined token-level features are then fused with function-level features for accurate vulnerability detection. Two pretraining tasks, namely the dynamic masked token prediction and the inter-code-line logical correlation prediction, are introduced to strengthen the model’s ability to handle semantic gaps and weak logical connections. Experimental results on both synthetic and real-world datasets show that BMGANet outperforms state-of-the-art methods.
Zhu et al. (Fri,) studied this question.