On Memorization and Generalization in Compact Transformers | Synapse