ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models | Synapse