RSTNet: Captioning with Adaptive Attention on Visual and Non-Visual Words | Synapse