March 3, 2026

Data-Driven Resource Management of Reconfigurable Multi-Accelerator Systems in the Cloud-Edge Continuum

Puntos clave

Improved resource management reduces execution time by up to 11% on FPGA systems, yielding proportional energy savings.
The methodology employs machine learning models that update in real-time, maintaining under 5% modeling overhead during operation.
An adaptive scheduling strategy, utilizing the Crow Search Algorithm, balances performance and resource fairness in task allocation.
The open-source system integrates virtualization, monitoring, and prediction for seamless operation across various FPGA architectures.

Resumen

Esta Tesis aborda la integración y gestión de recursos reconfigurables en el continuo cloud-edge. Se enfoca en sistemas reconfigurables multiacelerador sobre Field-Programable Gate Arrays (FPGAs), donde la reconfiguración dinámica parcial (DPR) permite explotar el paralelismo a nivel de datos (varias réplicas) y de tareas (varias tareas). La Tesis introduce una infraestructura que despliega y monitoriza cargas de trabajo dinámicas en nodos FPGA heterogéneos, desde placas de gama baja hasta dispositivos destinados al cloud, sin ajustes entre plataformas. La infraestructura amplía el framework ARTICo 3 , soportando dispositivos para el cloud y ejecución multiusuario; emplea un modelo cliente-servidor que coordina la aceleración en FPGA entre múltiples usuarios; y empaqueta aceleradores y software en contenedores orquestados con Kubernetes y Liqo para gestionar el movimiento y escalado de tareas de forma transparente, bajo restricciones de latencia, rendimiento o consumo. Se incluye un framework de monitorización que genera trazas de consumo y rendimiento, sin introducir penalizaciones de rendimiento en el sistema. Para afrontar la interacción entre kernels que se ejecutan en paralelo, esta Tesis propone una metodología de caracterización en tiempo de ejecución que entrena modelos basados en datos, para predecir consumo y rendimiento bajo los efectos de la interacción entre kernels. Se emplean modelos de aprendizaje automático (ML) incrementales que se actualizan durante la ejecución del sistema, evitando reentrenamientos completos cuando cambian las condiciones del sistema. La gestión del entrenamiento de estos modelos se realiza con un mecanismo de orquestación dedicado, que limita el impacto de los modelos en el sistema, reduciendo el impacto del modelado del >20% en alternativas de aprendizaje continuo a 20% in continuous learning alternatives to <5% while keeping prediction accuracy within 4% of the continuous approach. The models are device-agnostic and have been validated on boards with diverse architectures and power-measurement capabilities, requiring minor device-specific tuning. Built on these predictions, an adaptive, conflict-aware workload optimization methodology addresses task scheduling as a multi-objective optimization over discrete FPGA resources. The strategy employs the Crow Search Algorithm (CSA) metaheuristic, adapted to the discrete nature of the FPGA scheduling problem, and explores the solution space, trading between makespan, energy and fairness. Candidate solutions are evaluated with the run-time models, which account for standalone behavior and interference-induced slowdowns to make decisions. A comprehensive sensitivity analysis guides parameter settings to meet different operational goals. Even under worst-case conditions, the scheduler reduces total execution time by up to 11% relative to non-adaptive baselines, yielding proportional energy savings that are essential on resource-constrained edge devices, and significantly lower waiting times when fair use of resources is prioritized. By combining virtualization, monitoring, modeling and scheduling, this Thesis delivers an open-source, end-to-end approach that abstracts hardware details and scales across the cloud-edge continuum. Container-based virtualization enables seamless migration and transparent use of remote resources. The monitoring layer provides the run-time traces needed to understand system behavior. The characterization layer turns traces into accurate predictions of power and performance. Finally, the scheduler uses those predictions to guide run-time decisions that minimize interaction penalties. Extensive validation, from dwarf-inspired High-Level Synthesis (HLS) benchmarks to integrated use cases, shows that the methodology sustains its benefits without per-platform tuning, supports both high-performance and resource-constrained scenarios and offers a foundation to practical, portable FPGA acceleration in modern cloud-edge continuum environments.

Me gusta

Guardar

Cite This Study

Juan Encinas Anchústegui (Wed,) studied this question.

synapsesocial.com/papers/69a75ba7c6e9836116a23659

Me gusta

Guardar