A bilingual benchmark for evaluating large language models | Synapse