On the Evaluation of Dialogue Systems with Next Utterance Classification | Synapse