Textual Tokens Classification for Multi-Modal Alignment in Vision-Language Tracking | Synapse