Grounding Multimodal Large Language Models in Actions | Synapse