Learning and Fusing Multimodal Deep Features for Acoustic Scene Categorization | Synapse