In the legal domain, there exist huge volumes of complex documents with the need of accurate summarization for quicker decision making, better legal analysis and accessibility. As a new domain (legal language is intricate as it requires facts to be consistent, contextually nuanced, and often domain specific reasoning), traditional text summarization approaches localized in supervised learning are unable to handle these nuances. In this research, we present a novel reinforcement learning (RL) framework for legal summarization, combining Proximal Policy Optimization (PPO) and Reinforcement Learning with Human Feedback (RLHF). Unlike approaches that aim for human evaluated summary quality, our approach does not depend on any hand crafted metrics, and fine tunes transformer based summarization models using structured reward functions that align with the different principles (legal, factual correctness, logical coherence) of human summaries of the event. Human-in-the-loop feedback is integrated to perform iterative refinement of summaries, to accomplish improved legal fidelity and adaptability. Through extensive experimentation across diverse legal datasets, our framework outperforms standard transformer-based baselines (BART, T5) in both quantitative and qualitative evaluations. We also demonstrate the robustness of our model by conducting ablation studies, cross domain testing and expert evaluations about our model, showing the generalization ability and conformity with legal professional expectations of the model.
Patil et al. (Thu,) studied this question.