September 12, 2024

Robust Framework for Scalable AI Inference Using Distributed Cloud Services and Event-driven Architecture

YJYash JaniLam Research (United States)AJArth JaniAmazon (United States)

Puntos clave

Los puntos clave no están disponibles para este artículo en este momento.

Resumen

Abstract The deployment and scaling of AI models in production present challenges in ensuring efficient inference while maintaining cost-effectiveness and scalability. This paper proposes a robust framework using AWS services, including Elastic Beanstalk, SQS, EC2 instances with GPUs, EventBridge, Step Functions, API Gateway, and a hybrid approach with spot and on-demand instances. Elastic Beanstalk handles incoming requests and routes them to SQS for asynchronous processing. API Gateway manages rate limiting, while EventBridge and Step Functions dynamically scale the infrastructure. EC2 instances are pre-configured with AI scripts via EC2 Image Builder, requests are retrieved from SQS, models from Hugging Face are processed, and results are stored in Cloudflare. This architecture ensures high availability and responsiveness, optimizing performance and cost. Comprehensive experiments demonstrate the framework's effectiveness in handling varying inference demands, showing significant scalability, resource utilization, and operational efficiency improvements.

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo

Cite This Study

Jani et al. (Thu,) studied this question.

synapsesocial.com/papers/68e58ba1b6db64358752721f https://doi.org/https://doi.org/10.21203/rs.3.rs-4909036/v1

Preguntar a la IA

Me gusta

Guardar

Ver artículo completo