January 3, 2024

OmniVec: Learning robust representations with cross modal sharing

Key Points

Key points are not available for this paper at this time.

Abstract

Majority of research in learning based methods has been towards designing and training networks for specific tasks. However, many of the learning based tasks, across modalities, share commonalities and could be potentially tackled in a joint framework. We present an approach in such direction, to learn multiple tasks, in multiple modalities, with a unified architecture. The proposed network is composed of task specific encoders, a common trunk in the middle, followed by task specific prediction heads. We first pre-train it by self-supervised masked training, followed by sequential training for the different tasks. We train the network on all major modalities, e.g. visual, audio, text and 3D, and report results on 22 diverse and challenging public benchmarks. We demonstrate empirically that, using a joint network to train across modalities leads to meaningful information sharing and this allows us to achieve state-of-the-art results on most of the benchmarks. We also show generalization of the trained network on cross-modal tasks as well as unseen datasets and tasks.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Authors

Siddharth Srivastava

University of Warwick

Gaurav Sharma

Indian Institute of Technology Kanpur

Actions

Institutions

Under Armour (United States)

References and Citations

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Discussion

Cite this study

Srivastava et al. (Wed,) studied this question.

synapsesocial.com/papers/69d8192ea2a48916bbbef04c — DOI: https://doi.org/10.1109/wacv57701.2024.00127

Also consider

Synapse has enriched one closely related paper. Consider it for comparative context:

UniVL: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation· 2020 · 169 citations

OmniVec: Learning robust representations with cross modal sharing

Key Points

Abstract

Citation Network

Connected Papers

Discussion

Authors

Actions

Institutions

References and Citations

Citation Network

Connected Papers

Discussion

Cite this study

Also consider

Also consider