Language-Driven Cross-Attention for Visible–Infrared Image Fusion Using CLIP | Synapse