Learning with Fewer Bits Across Layers and Time in the Training of Foundation-Scale Transformers | Synapse