Dynamic Reward-Guided with Multi-Head Attention for Actor-Critic Policy Learning Optimization | Synapse