April 1, 2015

Librispeech: An ASR corpus based on public domain audio books

VPVassil PanayotovJohns Hopkins University GCGuoguo ChenNew England Biolabs (China)DPDaniel PoveyXiaomi (China)

Key Points

Key points are not available for this paper at this time.

Abstract

This paper introduces a new corpus of read English speech, suitable for training and evaluating speech recognition systems. The LibriSpeech corpus is derived from audiobooks that are part of the LibriVox project, and contains 1000 hours of speech sampled at 16 kHz. We have made the corpus freely available for download, along with separately prepared language-model training data and pre-built language models. We show that acoustic models trained on LibriSpeech give lower error rate on the Wall Street Journal (WSJ) test sets than models trained on WSJ itself. We are also releasing Kaldi scripts that make it easy to build these systems.

AI에게 질문

Bookmark

View Full Paper

Cite This Study

Panayotov et al. (Wed,) studied this question.

synapsesocial.com/papers/69d72b1166e6af6209f507f4 https://doi.org/https://doi.org/10.1109/icassp.2015.7178964

AI에게 질문

Bookmark

View Full Paper