Beyond statistical significance: Quantifying uncertainty and statistical variability in multilingual and multitask NLP evaluation | Synapse