What question did this study set out to answer?

To develop a photonic spiking deep deterministic policy gradient (spiking‐DDPG) architecture for energy-efficient reinforcement learning.

February 2, 2026

A Hardware‐Aware Photonic Spiking‐DDPG Reinforcement Learning Architecture for Continuous Control

Key Points

To develop a photonic spiking deep deterministic policy gradient (spiking‐DDPG) architecture for energy-efficient reinforcement learning.
Proposed a spiking‐DDPG architecture implemented on a photonic spiking neuromorphic chip (PSNC).
Utilized a Mach-Zehnder interferometer (MZI)‐based photonic synaptic array and DFB‐SA-based neuron arrays.
Combined PSNN Actor with ANN-based Critic for reinforcement learning tasks.
Tested on continuous control tasks: Pendulum-v1 and MountainCarContinuous-v0.
Achieved scores of -275 on Pendulum-v1 and 90 on MountainCarContinuous-v0.
Estimated energy consumption of 494.07 pJ per inference and a latency of 388.74 ps per inference.
Performance significantly surpasses traditional electronic methods in speed and energy efficiency.

Abstract

ABSTRACT Reinforcement learning (RL) is vital for continuous decision‐making in tasks such as robotic control and autonomous driving, yet conventional electronic hardware suffers from high energy consumption and latency due to the von Neumann bottleneck. In this paper, we propose a photonic spiking deep deterministic policy gradient (spiking‐DDPG) RL architecture and demonstrate its hardware implementation on a photonic spiking neuromorphic chip (PSNC). The PSNC consists of a Mach–Zehnder interferometer (MZI)‐based photonic synaptic array and distributed feedback laser with saturable absorber (DFB‐SA)‐based photonic spiking neuron arrays, arranged symmetrically on both sides, enabling complete spiking neuron functionality and scalable photonic spiking neural networks (PSNNs). We deploy the PSNN Actor on the PSNC and combine it with an artificial neural network (ANN)‐based Critic to form the spiking‐DDPG architecture. On the Pendulum‐v1 and MountainCarContinuous‐v0 continuous control tasks, the scores achieved were −275 and 90, respectively. The estimated energy consumption is 494.07 pJ/inf with an inference latency of 388.74 ps/inf, nearly an order of magnitude better than electronic counterparts. These results demonstrate that the photonic spiking‐DDPG architecture enables ultrafast, energy‐efficient RL for continuous control, offering a promising route toward real‐time decision‐making in robotics and autonomous systems.

Bookmark

Cite This Study

Zeng et al. (Sat,) studied this question.

synapsesocial.com/papers/6980ffa4c1c9540dea8124d2 https://doi.org/https://doi.org/10.1002/lpor.202502481

Also Consider

Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context:

Bookmark