What question did this study set out to answer?

This research aims to improve the detection of financial statement fraud using advanced language processing techniques.

February 12, 2026Open Access

Improving Financial Statement Fraud Detection: A Large Language Model Processing Approach

Key Points

This research aims to improve the detection of financial statement fraud using advanced language processing techniques.
Developed a representation learning method focusing on MD&A documents.
Aligned paragraphs from consecutive disclosures by their similarity.
Categorized paragraph changes into added, deleted, and matched types.
Created multivariate change trajectory representations using fraud-related word categories.
Compared the new model against traditional and time-series models.
The new method significantly enhanced fraud detection performance across 11 machine learning models.
Outperformed traditional word frequency methods consistently.
Demonstrated effective tracking of MD&A changes over time.

Abstract

With the prevalence of Internet AI technology, financial fraud becomes an imperative problem, especially in the context of machine learning. Technologies such as deep learning and natural language processing provide effective tools for detecting fraud with the guidance of financial statements, improving the efficiency and accuracy of data analysis, and helping to ensure financial safety. In this study, we propose a sophisticated representation learning method to detect financial statement fraud by tracking the detailed changes in the company’s Management Discussion and Analysis (MD&A) documents over time. Unlike traditional word frequency methods, we align paragraphs between consecutive disclosures based on their similarity at the paragraph level and categorize them into three types: added, deleted, and matched. Next, we create multivariate change trajectory representations based on fraud-related word categories. Finally, we use these word-level change trajectories to design a fraud detection model and compare it with several traditional models as well as the latest Time-Series Foundation Models. Experiments on 24 years of financial report data, from 1995 to 2019, show that our representation learning method significantly improves the performance of financial statement fraud detection across 11 different machine learning models, consistently outperforming traditional word frequency methods. Our method opens a new paradigm for feature engineering in financial statement fraud detection. Our code can be found at https://github.com/LittelStudent/Financial-Statement-Fraud-Detection-ParaEmb-FraudW2V.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Yue Yu

Ministry of Natural Resources

Zhen Wu

Institute of Information Engineering

Yanni Han

Institute of Information Engineering

Journals

ACM Transactions on Internet Technology

Actions

Institutions

Fordham University

Institute of Information Engineering

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Improving Financial Statement Fraud Detection: A Large Language Model Processing Approach

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Journals

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study