Interpreting Transformer Architectures as Implicit Multinomial Regression | Synapse