From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions | Synapse