March 8, 2026

Developing a Testing Framework for Applications of LLMs within Islamic Finance

Abstract

This study evaluates Large Language Models (LLMs) capabilities in processing Shariah-related queries within Islamic finance. We introduce a three-part benchmark framework. First, a multiple-choice dataset testing factual knowledge. Second, a vulnerability dataset assessing resistance to erroneous fatwas. Third, an applied reasoning dataset evaluating usul al-fiqh methodology. Six models, including ChatGPT, Claude, and a domain-aligned Islamic model, were tested. Results confirm that LLMs are unqualified to issue new Islamic legal rulings, showing susceptibility to theological drift under adversarial prompting. Models also struggled to reliably apply established rulings to familiar scenarios, displaying weaknesses in legal maxims and cross-school reasoning. However, several models demonstrated utility in factual retrieval and the summarization of Islamic finance concepts. This framework provides the first structured benchmark for evaluating Islamic finance AI applications.

Bookmark

Cite This Study

Al-Syed et al. (Wed,) studied this question.

synapsesocial.com/papers/69acc57d32b0ef16a404fb22 https://doi.org/https://doi.org/10.1142/s2811023425500029

Bookmark