Key points are not available for this paper at this time.
In this paper we propose an efficient hardware architecture that implements the Q-Learning algorithm, suitable for real-time applications. Its main features are low-power, high throughput and limited hardware resources. We also propose a technique based on approximated multipliers to reduce the hardware complexity of the algorithm. We implemented the design on a Xilinx Zynq Ultrascale+ MPSoC ZCU106 Evaluation Kit. The implementation results are evaluated in terms of hardware resources, throughput and power consumption. The architecture is compared to the state of the art of Q-Learning hardware accelerators presented in the literature obtaining better results in speed, power and hardware resources. Experiments using different sizes for the Q-Matrix and different wordlengths for the fixed point arithmetic are presented. With a Q-Matrix of size 8 × 4 (8 bit data) we achieved a throughput of 222 MSPS (Mega Samples Per Second) and a dynamic power consumption of 37 mW, while with a Q-Matrix of size 256 × 16 (32 bit data) we achieved a throughput of 93 MSPS and a power consumption 611 mW. Due to the small amount of hardware resources required by the accelerator, our system is suitable for multi-agent IoT applications. Moreover, the architecture can be used to implement the SARSA (State-Action-Reward-StateAction) Reinforcement Learning algorithm with minor modifications.
Building similarity graph...
Analyzing shared references across papers
Loading...
Sergio Spanò
G.C. Cardarilli
Luca Di Nunzio
IEEE Access
Technical University of Denmark
University of Rome Tor Vergata
Building similarity graph...
Analyzing shared references across papers
Loading...
Spanò et al. (Tue,) studied this question.
www.synapsesocial.com/papers/6a0a53e9a9576e6c7db4ec3b — DOI: https://doi.org/10.1109/access.2019.2961174