Graph Neural Networks (GNNs) have become pivotal for analyzing relational data in embedded intelligent systems such as IOT devices. However, their deployment on resource-constrained devices faces critical barriers: traditional graph partitioning methods induce unbalanced computational loads due to rigid granularity, while hardware mapping strategies cause inefficient resource utilization under dynamic graph structures. These limitations conflict with the requirements of embedded systems for resource efficiency and scalability. To address this, we present GNNmap, a hardware-software co-design framework that synergizes multi-granular graph partitioning with topology-aware GNN mapping. The framework first reconstructs input graphs into balanced kernel groups comprising cohesive supernodes (corresponding to parallelizable subgraphs). By combining coarse-grained partitioning with fine-grained optimization, GNNmap ensures load balance while dramatically reducing cross-subgraph communication. Concurrently, a subgraph-PE mapping based on coarse-grained reconfigurable architectures (CGRAs) enables efficient graph-to-hardware matching through the joint modeling of graph topological features and hardware resource constraints. By dynamically coordinating graph reorganization and hardware resource allocation, GNNmap resolves the intrinsic mismatch between irregular graph computations and static hardware configurations. Experimental results demonstrate that GNNmap achieves improvements over existing works, improving inference performance by 1.47× to 62.8×, resource efficiency by 1.15× to 3.06×, and energy efficiency by 1.34× to 3.50×.
Fan et al. (Tue,) studied this question.