Optimizing the Structures of Transformer Neural Networks Using Parallel Simulated Annealing | Synapse