Towards more realistic evaluation of LLM-based code generation: an experimental study and beyond | Synapse