Artificial intelligence (AI) is increasingly being explored as a tool for legal research and reasoning, yet its effectiveness in applying South African legal principles remains underexamined. This study evaluates the performance of five generative AI models—ChatGPT 4o, Claude 3.7 Sonnet, DeepSeek R1, Gemini 2.0 Flash, and Grok3 Beta—across three private law scenarios involving actio de pauperie, negotiorum gestio, and actio legis Aquiliae. Each AI model’s response was assessed against seven criteria: identification of the correct legal action, accuracy of legal requirements, application to facts, case law citation, relevance of case law, consideration of defences, and clarity of the final answer. The findings reveal that while AI models generally identify and apply South African legal principles correctly, their performance varies significantly. Claude performed the strongest overall, demonstrating structured legal reasoning and engagement with statutory provisions, while ChatGPT followed closely but was undermined by hallucinated case law. DeepSeek provided sound reasoning but occasionally misapplied legal principles. Gemini and Grok were the weakest, with incomplete legal analyses and limited case law engagement. A key limitation across all models was the unreliable retrieval and application of case law, with frequent misinterpretations and fabricated references. Additionally, most models failed to incorporate statutory law unless explicitly prompted. These results underscore the potential of AI as a supplementary legal tool while highlighting its current limitations. Future research should explore AI’s competency in broader areas of South African law, including statutory interpretation and constitutional analysis, to better understand its role in legal practice and academia. While AI can assist legal professionals, human oversight remains essential to ensure doctrinal accuracy and case law reliability.
Donrich Thaldar (Thu,) studied this question.