How Reliable Are Automatic Evaluation Methods for Instruction-Tuned LLMs? | Synapse