FPGAs have become key players in data-centers. However, the integration of such accelerators poses several challenges related to Quality of Service (QoS). Herein we propose a compiler-based toolchain that increases FPGA flexibility by enabling dynamic stateful HW-SW migration. Task migration is instrumental in solving at least fault-tolerance and preemption in FPGA. Thus, we design, based on our toolchain, a checkpointing-rollback framework to enable restoring a task after a component failure (i.e., FPGA crash) and making a first step toward fault-tolerant data-center systems. Starting from the Xilinx HLS compilation workflow, we design 1) a set of LLVM optimization passes that instrument an FPGA-accelerated application with migration points, and 2) an asynchronous periodic data backup scheme for efficient context transfers. These together allow the FPGA-accelerated task to migrate statefully from the FPGA onto its host CPU where execution is resumed. We evaluate this proposal on several applications and show that, although reliability (inevitably) comes at a cost, our framework offers promising results by transforming a set of common benchmark kernels into fault-tolerant kernels with acceptable best-case runtime overheads (e.g., 1.1x for Gaussian Blur filter and MaxPool operation). We also show that FPGA task preemption allows HW-SW multitasking.
France-Pillois et al. (Mon,) studied this question.