Tackling Vision Language Tasks through Learning Inner Monologues | Synapse