Pitfalls in Evaluating Language Model Forecasters | Synapse