ZeRO: Memory optimizations Toward Training Trillion Parameter Models | Synapse