CriticBench: Evaluating Large Language Models as Critic | Synapse