DiLLeMa: An extensible and scalable framework for distributed large language models (LLMs) inference on multi-GPU clusters | Synapse