STFormer: Spatio‐temporal former for hand–object interaction recognition from egocentric RGB video | Synapse