Joint learning of reward machines and policies in environments with partially known semantics | Synapse