Reducing Activation Recomputation in Large Transformer Models | Synapse