Language Models can Evaluate Themselves via Probability Discrepancy | Synapse