Key points are not available for this paper at this time.
To establish robotic application in human environments as, e.g. offices or private homes the robotic systems must be instructable by ordinary users in a natural way. In interpersonal communication humans usually apply different sensory information and are capable of integrating all perceptual cues fast and consistently. Additionally, knowledge acquired during the communication process is directly used to resolve ambiguities. As a step towards realizing similar capabilities in automatic devices this paper presents an integrated system combining automatic speech processing and image understanding. The system is intended to be an intelligent interface of a robot which manipulates objects in its surroundings according to the instructions of a human. The enhanced capabilities necessary for carrying out a multimodal man-machine dialog are realized by combining statistical and declarative methods for inference and knowledge representation. The effectiveness of this approach is demonstrated using an exemplary dialog from our construction task domain.
Bauckhage et al. (Thu,) studied this question.