Evaluating language models for mathematics through interactions | Synapse