Los puntos clave no están disponibles para este artículo en este momento.
With the rapid growth of multimedia data, multimodal retrieval has become an important research field. Multimodal retrieval is a retrieval task involving multiple media types (such as text, image, audio, etc.). With the explosion of data in various fields into the Internet, data is presented in various forms such as video, pictures, etc., the single module of data retrieval can no longer meet the needs of information development, and the demand for multi-modal data retrieval is increasing. In order to resolve the problem, by searching and reading literature, this paper analyzes different research methods such as shared representation learning, deep learning multimodal fusion, and hash method, which are needed in the process of multi-modal retrieval, and sorts out and sums up the basic ideas of researchers to solve these problems. Finally, the future development direction and application prospect of multimodal retrieval are prospected. This paper hopes to provide reference and inspiration for the subsequent research and application of multimodal retrieval and promote the development of multimodal retrieval technology.
Cheng et al. (Mon,) studied this question.