Sparse Mixture-of-Experts Transformers for Efficient Scaling of Large Language Models | Synapse