Multi-head Reward Aggregation Guided by Entropy | Synapse