Quantifying Variance in Evaluation Benchmarks | Synapse