CrossFormer: Cross-modal Representation Learning via Heterogeneous Graph Transformer | Synapse