Unbiased Embodied Visual Representation Learning with Causal Inference and Cross-Modality Alignment | Synapse