Extremely Large-Scale Multiple-Input Multiple-Output (XL-MIMO) is positioned as a transformative technology for sixth-generation (6G) networks, effectively turning base stations into high-resolution sensing and communication hubs. However, the practical deployment of XL-MIMO is hindered by the “curse of dimensionality,” specifically the prohibitive overhead associated with Channel State Information (CSI) sensing and feedback, alongside the computational latency of massive antenna arrays. To resolve the conflict between high-resolution sensing requirements and limited bandwidth resources, this paper proposes a novel two-stage beamforming architecture that synergizes physics-aware dimensionality reduction with deep learning. First, by exploiting the inherent sparsity of XL-MIMO channels in the angle-delay domain, we design a Spatial–Frequency Concentration Block (SFCB). This module functions as a hard-attention sensing mechanism, performing efficient source-end dimensionality reduction on raw CSI at the User Equipment (UE) via precise feature extraction and adaptive energy truncation. Second, we develop a highly adaptable Direct Integrated Precoding Network (DIP-I). Departing from the conventional “sense-reconstruct-then-precode” paradigm, DIP-I learns end-to-end mapping to directly regress the optimal precoding matrix at the Base Station (BS). Comprehensive simulations utilizing the COST 2100 and QuaDRiGa hybrid channel models demonstrate that, under a massive 512-antenna configuration, the proposed framework achieves exceptional beamforming gain. Furthermore, it significantly reduces sensing data overhead and inference latency, offering a superior trade-off between spectral efficiency and hardware resource consumption for future 6G sensing-communication integrated systems.
Wen et al. (Tue,) studied this question.