A multimodal learning framework using mel-spectrogram convolutional neural networks for English vocabulary acquisition | Synapse