Abstract Visual localization has shown promise for accurate Unmanned Aerial Vehicle (UAV) navigation in GNSS-denied environments due to its high spatial resolution. However, performance declines in low-light or texture-scarce conditions and incurs high computational costs. In contrast, non-visual sensors offer a lightweight, low-complexity alternative for localization under such conditions. This work introduces CLAK (CNN-LSTM-Attention-KAN), a deep learning framework that estimates global UAV positions (latitude, longitude, and elevation) using non-visual sensors such as LiDAR, barometric altitude, and Inertial Measurement Unit (IMU). CLAK combines Convolutional Neural Networks (CNNs) for spatial encoding, Long Short-Term Memory (LSTM) layers for temporal modeling, attention for feature prioritization, and Kolmogorov-Arnold Networks (KANs) for flexible nonlinear regression. The model is trained on synthetic UAV flight data generated using a ROS2 simulation framework that integrates Gazebo for 3D environment simulation and PX4 for autopilot control, with QGroundControl (QGC) managing mission planning and monitoring. Elevation data is extracted from a Digital Elevation Map (DEM) of the Taif region in Saudi Arabia, while ground truth positions are derived from simulated onboard Global Positioning System (GPS) outputs. Simulation results across different flight trajectories show CLAK can achieve up to 78. 35% Mean Absolute Error (MAE) and 75. 40% Root Mean Squared Error (RMSE) reduction compared to the LSTM baseline, while maintaining an average coefficient of determination (R² R 2) of 0. 998. These results demonstrate the scalability and precision of CLAK for UAV localization in GNSS-denied conditions.
Jarraya et al. (Mon,) studied this question.