Key points are not available for this paper at this time.
In this paper, we propose a novel algorithm to model and recognize sign language with Kinect sensor. We assume that in a sign language video, some frames are expected to be both discriminative and representative. Under this assumption, each frame in training videos is assigned a binary latent variable indicating its discriminative capability. A Latent Support Vector Machine model is then developed to classify the signs, as well as localize the discriminative and representative frames in videos. In addition, we utilize the depth map together with color image captured by Kinect sensor to obtain more effective and accurate feature to enhance the recognition accuracy. To evaluate our approach, we collected an American Sign Language (ASL) dataset which included approximately 2000 phrases, while each phrase was captured by Kinect sensor and hence included color, depth and skeleton information. Experiments on our dataset demonstrate the effectiveness of the proposed method for sign language recognition.
Sun et al. (Sun,) studied this question.