Pretraining on the Test Set Is No Longer All You Need: A Debate-Driven Approach to QA Benchmarks | Synapse