Demystifying Tensor Cores to Optimize Half-Precision Matrix Multiply | Synapse