SightSound-R1: Cross-Modal Reasoning Distillation from Vision to Audio Language Models | Synapse