Poor-Supervised Evaluation for SuperLLM via Mutual Consistency | Synapse