This study investigates the effectiveness of various model architectures and training strategies for legal text classification, with a focus on entailment classification for case decisions from the Federal Court of Canada. We compare the performance of RoBERTa models with and without domain-specific further pretraining, to larger language models such as Llama 2, Llama 3, and GPT-4o adapted to the task by using prompt engineering and LORA fine-tuning. Additionally, we investigate different methods that can be used to explain the decisions of the models and evaluate their adequacy, understandability, trustworthiness, and sufficiency. Our findings suggest that for legal entailment classification, domain-specific pretraining can improve performance for smaller models, while larger language models show promise in outperforming prompt engineering for classification when fine-tuned with LoRA, as well as in generating more interpretable explanations. To the best of our knowledge, this is the first study in the context of Canadian legal AI to explore the effects of further pretraining on small and large language models, and the integration of language model adaptation and explainability into one system for legal text entailment classification.
Building similarity graph...
Analyzing shared references across papers
Loading...
Michel Custeau
Diana Inkpen
University of Ottawa
Building similarity graph...
Analyzing shared references across papers
Loading...
Custeau et al. (Mon,) studied this question.
synapsesocial.com/papers/69a75acec6e9836116a211fa — DOI: https://doi.org/10.21428/594757db.d23b1499