March 3, 2026Open Access

DiLLeMa: An extensible and scalable framework for distributed large language models (LLMs) inference on multi-GPU clusters

The framework supports distributed computing for large language models, enhancing inference speed and efficiency.
Performance tests indicate a notable improvement in processing time for large language models across multiple GPUs.
Observational analysis focuses on scalability and extensibility features, ensuring usability in various computing environments.
Highlights the significance of improved inference mechanisms, offering considerable advantages in artificial intelligence applications.

Connected Papers

Building similarity graph...

Analyzing shared references across papers

Robby Ulung Pambudi

Sepuluh Nopember Institute of Technology

Ary Mazharuddin Shiddiqi

Sepuluh Nopember Institute of Technology

Royyana Muslim Ijtihadie

Sepuluh Nopember Institute of Technology

SoftwareX

SHILAP Revista de lepidopterología

Sepuluh Nopember Institute of Technology

Building similarity graph...

Analyzing shared references across papers