Beyond accuracy: Quantifying the reliability of multiple instance learning for whole slide image classification | Synapse