Beyond Accuracy: Behavioral Testing of NLP Models with CheckList | Synapse