Purpose This paper aims to present a modular pipeline approach by combining extractive and abstractive summarisation methods in order to improve the accuracy of automatic text summarisation (ATS). Design/methodology/approach In our framework we have used LangChain 20, a large language model integration platform, to orchestrate an extractive algorithm (LexRank) and an abstractive algorithm (Bidirectional and Auto-Regressive Transformers (BART)). LexRank identifies key sentences from the input text, ensuring core information is retained without redundancy, BART is used for refining these extracted sentences into human-like, concise summaries. Findings We evaluated the performance of our framework by using CNN/Daily Mail, Newsroom and XSum datasets, which are widely used benchmark for summarisation tasks. Evaluation results demonstrate that this modular approach improves accuracy of generated summaries in two ways, (1) word level similarities between reference summary and generated summary and (2) semantic similarity between reference summary and generated summary. Research limitations/implications In our experiments, we used datasets that have short length documents (within 7,000 characters). We need to perform more experiments with long documents to evaluate the robustness of our pipeline. We have not investigated error propagation from extractive phase to abstractive phase that may have an impact on the performance of our framework. Originality/value The primary contribution of this study is the development and evaluation of a modular pipeline combining LexRank, an extractive summarisation technique, and BART, an advanced abstractive model, using LangChain. The results indicate that the hybrid approach outperforms traditional single-model methods and other existing hybrid models, as evidenced by higher Recall-Oriented Understudy for Gisting Evaluation and BERTScores across various examples.
Mustafa et al. (Tue,) studied this question.