ABSTRACT — Managing the growing flood of spoken audio from digital meetings, lectures, and podcasts has become a genuine challenge in today's information-heavy world. This paper introduces an AI-Based Voice Summarizer and Language Translation System — a web-based platform that takes raw audio and turns it into structured, understandable text through a chain of intelligent processing stages. The system uses OpenAI's Whisper model to transcribe speech into accurate text, applies the BART transformer model to distill lengthy transcripts into concise summaries, and then passes those summaries through Helsinki-NLP's MarianMT models to generate translations in French, Spanish, Hindi, and Malayalam. A built-in Text-to-Speech feature powered by gTTS further allows users to listen to the summary rather than read it. Built on a Python and Flask backend with a clean HTML/CSS/JavaScript frontend, the system follows a modular pipeline that is easy to maintain and scale. Testing across a range of audio samples — including meeting recordings, lecture clips, and uploaded audio files — showed approximately 99% transcription accuracy on clear speech and 65–75% content compression in the generated summaries. The system is well-suited to students, professionals, and researchers who need to extract key information from audio content quickly and in multiple languages.
S et al. (Fri,) studied this question.