Calibrating Multi-modal Representations: A Pursuit of Group Robustness without Annotations | Synapse