What question did this study set out to answer?

This research focuses on developing a system to summarize and translate spoken audio content.

March 15, 2026Open Access

AI-Based Voice Summarizer and Language Translation System

Key Points

This research focuses on developing a system to summarize and translate spoken audio content.
Developed an AI-Based Voice Summarizer and Language Translation System using OpenAI's Whisper model for transcription
Utilized the BART transformer model for summarization
Employed Helsinki-NLP's MarianMT models for language translation
Implemented a modular pipeline with Python and Flask backend and HTML/CSS/JavaScript frontend
Tested the system on various audio samples including meetings and lectures
Achieved approximately 99% transcription accuracy on clear speech
Obtained 65–75% content compression in generated summaries
Designed for use by students, professionals, and researchers to quickly extract information from audio

Abstract

ABSTRACT — Managing the growing flood of spoken audio from digital meetings, lectures, and podcasts has become a genuine challenge in today's information-heavy world. This paper introduces an AI-Based Voice Summarizer and Language Translation System — a web-based platform that takes raw audio and turns it into structured, understandable text through a chain of intelligent processing stages. The system uses OpenAI's Whisper model to transcribe speech into accurate text, applies the BART transformer model to distill lengthy transcripts into concise summaries, and then passes those summaries through Helsinki-NLP's MarianMT models to generate translations in French, Spanish, Hindi, and Malayalam. A built-in Text-to-Speech feature powered by gTTS further allows users to listen to the summary rather than read it. Built on a Python and Flask backend with a clean HTML/CSS/JavaScript frontend, the system follows a modular pipeline that is easy to maintain and scale. Testing across a range of audio samples — including meeting recordings, lecture clips, and uploaded audio files — showed approximately 99% transcription accuracy on clear speech and 65–75% content compression in the generated summaries. The system is well-suited to students, professionals, and researchers who need to extract key information from audio content quickly and in multiple languages.

Read Full Paperexternally

Bookmark

View Full Paper

Cite This Study

S et al. (Fri,) studied this question.

synapsesocial.com/papers/69b5ff8083145bc643d1c178 https://doi.org/https://doi.org/10.5281/zenodo.18994488

Bookmark

View Full Paper