Evaluating large language models for accuracy incentivizes hallucinations | Synapse