Indoor positioning systems based on received signal strength (RSS) achieve indoor positioning by leveraging the position-related features inherent in spatial RSS fingerprint images. Their positioning accuracy and robustness are directly influenced by the quality of fingerprint features. However, the inherent spatial low-resolution characteristic of spatial RSS fingerprint images makes it challenging to effectively extract subtle fingerprint features. To address this issue, this paper proposes an RSS-based indoor positioning method that combines enhanced spatial frequency fingerprint representation with fusion learning. First, bicubic interpolation is applied to improve image resolution and reveal finer spatial details. Then, a 2D fast Fourier transform (2D FFT) converts the enhanced spatial images into frequency domain representations to supplement spectral features. These spatial and frequency fingerprints are used as dual-modality inputs for a parallel convolutional neural network (CNN) model with efficient multi-scale attention (EMA) modules. The model extracts modality-specific features and fuses them to generate enriched representations. Each modality—spatial, frequency, and fused—is passed through a dedicated fully connected network to predict 3D coordinates. A coordinate optimization strategy is introduced to select the two most reliable outputs for each axis (x, y, z), and their average is used as the final estimate. Experiments on seven public datasets show that the proposed method significantly improves positioning accuracy, reducing the mean positioning error by up to 47.1% and root mean square error (RMSE) by up to 54.4% compared with traditional and advanced time–frequency methods.
Lai et al. (Tue,) studied this question.