What question did this study set out to answer?

The aim is to improve long-form audio data processing using open-source tools for transcription and analysis.

March 12, 2026Open Access

Tutorial: Whisper Tools for Long-form Audio Data Processing

Key Points

The aim is to improve long-form audio data processing using open-source tools for transcription and analysis.
Presented experiments using Whisper and WhisperX tools.
Focused on processing Korean speech data.
Addressed challenges related to audio data containing various speech types and noise.
Initial insights into the effectiveness of Whisper tools on Korean audio data.
Highlighted limitations in current transcription capabilities and requirements for improvement.

Abstract

Long-form audio data processing is a challenging task, as audio files can contain different types of speech (adult, child, infant speech), background noice, and silence. Recording and processing software such as LENA (Gilkerson & Richards, 2020) allows processing audio data. However, the tool is closed-source and offers a limited number of processing tasks. LENA proposes labels for speech diarization (who talks and when), but not for the transcription of the audio file into textual data. Transcription is an important step for various NLP tasks, such as morpho-syntactic and sentiment analysis. This tutorial attempts to bridge this gap by presenting preliminary experiments using open-source audio processing NLP and AI tools such as Whisper and WhisperX. We explored the challenges of applying these tools to Korean speech data and we present first results.

Tutorial: Whisper Tools for Long-form Audio Data Processing

Key Points

Abstract

Cite This Study