Detection of financial fraud remains a constant challenge due to the dynamic and highly imbalanced nature of transaction data. This paper proposes the Graph-Temporal Contrastive Transformer (GTCT) framework for modeling both structural dependencies between accounts and temporal evolution in transactional behaviors. We propose a model that combines three components: a graph encoder for modeling relationships between accounts, a temporal encoder for learning sequential patterns in transactions, and a contrastive learning objective that enhances the robustness of representations when supervision is limited. To assess the contribution of each component individually, we systematically remove one module at a time. As shown, an exclusion of the contrastive loss resulted in reduced recall and AUC from 0.867 and 0.982 to 0.805 and 0.948, respectively, indicating the importance of self-supervised learning of representations in fraud detection. Similarly, removing the graph encoder decreased the F1-score from 0.876 to 0.786, which confirmed that modeling transaction structures between accounts is crucial for the identification of complex fraud rings. The exclusion of the temporal encoder led to a more drastic drop in recall (0.743) and AUC (0.905), indicating that capturing the temporal dynamics of transactions is relevant. By comparing all variants, the full GTCT model attained the highest accuracy (0.975) and AUC (0.982), thus showing superior robustness in the detection of sophisticated and evolving financial fraud patterns.
Olaniyan et al. (Mon,) studied this question.