Local feature‐based video captioning with multiple classifier and CARU‐attention | Synapse