What question did this study set out to answer?

The aim is to enhance archival openness auditing by leveraging advanced semantic understanding and interpretability.

March 13, 2026Open Access

Building Trustworthy Digital Archival Services: A Deep Semantic Auditing Approach Based on SHAP Interpretability

Key Points

The aim is to enhance archival openness auditing by leveraging advanced semantic understanding and interpretability.
Developed Assisted Archival Auditing Model (ALC-MCFN) incorporating semantic-aware dynamic truncation.
Integrated features from BERT, TextCNN, and TextGCN for comprehensive text analysis.
Utilized SHAP for post hoc interpretability to clarify the decision-making process.
ALC-MCFN achieved a 77.21% F1-score on the OParchives dataset.
This score exceeded the BERT baseline by 1.15 percentage points.
The model demonstrated improved transparency and decision-making trustworthiness.

Abstract

In the context of the cross-disciplinary integration of data science and archival management, archival openness auditing stands as a critical process for public information access but faces challenges in processing long texts with sparse core information. To address this, this paper proposes an Assisted Archival Auditing Model (ALC-MCFN) based on deep semantic understanding and decision transparency. The model aims to leverage intelligent analytics to optimize the decision-making process of archival openness. Regarding deep semantic understanding, a semantic-aware dynamic truncation mechanism is first employed to effectively remove redundancy while preserving key logical structures. Subsequently, by fusing global, local, and logical semantic features extracted by BERT, TextCNN, and TextGCN, the model overcomes the limitations of single-view feature representation. Furthermore, to address the “black box” issue of deep learning in compliance auditing, the SHAP method is introduced to provide post hoc interpretability. By visualizing the contribution of key textual features to the auditing results, the model enhances the transparency and trustworthiness of decision-making. Experimental results demonstrate that ALC-MCFN outperforms mainstream baseline models, with a 77.21% F1-score on the self-built archival domain OParchives dataset (1.15 percentage points higher than the BERT baseline), providing robust data science support for risk control and efficiency improvement in intelligent archival management.

Building Trustworthy Digital Archival Services: A Deep Semantic Auditing Approach Based on SHAP Interpretability

Key Points

Abstract

Cite This Study