Abstract Legal reasoning is complex and multi-faceted, requiring a broad set of skills. By employing domain knowledge from legal experts, we design five elements that can be included in prompts for large language models that could aid in legal reasoning tasks. We use additional legal guidelines, 1-shot prompting, dictionary definitions, knowledge representations of legal articles, and IRAC-style prompting. We investigate the effect of each prompt element on the model’s performance on a legal entailment task. Certain prompt elements can improve performance, depending on the context and the model. For the smaller models, increasing the number of prompt elements improves performance on average. For any particular combination of model and sub-task, only using a subset of the prompt elements seems to work best. For the most advanced reasoning model we evaluate, using a selection of prompt elements increases average performance across all evaluated sub-tasks. Results indicate that the problem space of the legal entailment task may be too large for a single model and prompt. In future research, we therefore aim to investigate the capabilities of an ensemble of specialized models.
Steging et al. (Wed,) studied this question.