May 1, 2002

CUAVE: A new audio-visual database for multimodal human-computer interface research

Key Points

Key points are not available for this paper at this time.

Abstract

Multimodal signal processing has become an important topic of research for overcoming certain problems of audio-only speech processing. Audio-visual speech recognition is one area with great potential. Difficulties due to background noise and multiple speakers are significantly reduced by the additional information provided by extra visual features. Despite a few efforts to create databases in this area, none has emerged as a standard for comparison for several possible reasons. This paper seeks to introduce a new audiovisual database that is flexible and fairly comprehensive, yet easily available to researchers on one DVD. The CUAVE database is a speaker-independent corpus of over 7,000 utterances of both connected and isolated digits. It is designed to meet several goals that are discussed in this paper. The most notable are availability of the database, flexibility for use of the audio-visual data, and realistic considerations in the recordings (such as speaker movement). Another important focus of the database is the inclusion of pairs of simultaneous speakers, the first documented database of this kind. The overall goal of this project is to facilitate more widespread audio-visual research through an easily available database. For information on obtaining CUAVE, please visit our webpage (http://ece.clemson.edu/speech).

Ask AI

Mark Helpful

Bookmark

Relay