What question did this study set out to answer?

The aim is to improve the accuracy of automatic text summarisation by using a hybrid approach.

May 7, 2026Open Access

Automatic text summarisation using a modular pipeline approach with LangChain

Key Points

The aim is to improve the accuracy of automatic text summarisation by using a hybrid approach.
Developed a modular pipeline combining extractive (LexRank) and abstractive (BART) summarisation methods.
Utilized LangChain to integrate summarisation algorithms in the framework.
Evaluated the framework on CNN/Daily Mail, Newsroom, and XSum datasets.
The modular pipeline improves accuracy through word level and semantic similarity evaluation.
Hybrid approach outperformed traditional single-model and existing hybrid models.

Abstract

Purpose This paper aims to present a modular pipeline approach by combining extractive and abstractive summarisation methods in order to improve the accuracy of automatic text summarisation (ATS). Design/methodology/approach In our framework we have used LangChain 20, a large language model integration platform, to orchestrate an extractive algorithm (LexRank) and an abstractive algorithm (Bidirectional and Auto-Regressive Transformers (BART)). LexRank identifies key sentences from the input text, ensuring core information is retained without redundancy, BART is used for refining these extracted sentences into human-like, concise summaries. Findings We evaluated the performance of our framework by using CNN/Daily Mail, Newsroom and XSum datasets, which are widely used benchmark for summarisation tasks. Evaluation results demonstrate that this modular approach improves accuracy of generated summaries in two ways, (1) word level similarities between reference summary and generated summary and (2) semantic similarity between reference summary and generated summary. Research limitations/implications In our experiments, we used datasets that have short length documents (within 7,000 characters). We need to perform more experiments with long documents to evaluate the robustness of our pipeline. We have not investigated error propagation from extractive phase to abstractive phase that may have an impact on the performance of our framework. Originality/value The primary contribution of this study is the development and evaluation of a modular pipeline combining LexRank, an extractive summarisation technique, and BART, an advanced abstractive model, using LangChain. The results indicate that the hybrid approach outperforms traditional single-model methods and other existing hybrid models, as evidenced by higher Recall-Oriented Understudy for Gisting Evaluation and BERTScores across various examples.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

Mustafa et al. (Tue,) studied this question.

synapsesocial.com/papers/69fbe2f2164b5133a91a255e https://doi.org/https://doi.org/10.1108/aci-10-2025-0437

Bookmark

View Full Paper