This work presents an 8-Mb embedded NOR Flash-based compute-in-memory (CIM) SoC supporting up to 4 MB of INT8 weights, fabricated in 55 nm CMOS, targeting energy-efficient edge AI inference. Our architecture integrates a voltage-sensing crossbar array with an optimized sampling scheme, achieving a 9× throughput improvement over conventional methods. A dedicated on-chip hardware accelerator seamlessly integrates data movement, CIM-based vector–matrix multiplication, and accelerated activation functions, reducing end-to-end inference latency by at least 4× compared to CPU baselines. Complementing the hardware, we develop a software stack that automates the translation of high-level neural network descriptions into flash-programming binaries, simplifying deployment. Experimental results demonstrate 0.16 TOPS at 21.4 mW (7.48 TOPS/W), 92.8% classification accuracy on a human-body detection conventional neural network (CNN), and full end-to-end functionality, confirming NOR Flash as a scalable and commercially viable platform for practical analog edge AI systems.
Jin et al. (Sun,) studied this question.
Synapse has enriched 5 closely related papers on similar clinical questions. Consider them for comparative context: