From benchmarks to deployment: a comprehensive review of agentic AI evaluation | Synapse