Autonomous systems rely heavily on visual perception to operate reliably in dynamic and unpredictable environments, gradually transitioning from passive observation to active interaction, making robust and adaptive visual tracking essential for real-time performance. This thesis provides a series of contributions that strengthen the tracking capabilities of such systems, both in specialized contexts like aerial cinematography and in general-purpose autonomous applications. A novel active vision method to improve performance of computer vision tasks is also provided. A study focusing on Unmanned Aerial Vehicles (UAVs) presents a geometric modeling framework that relates desired shot types, UAV/camera trajectories, and camera focal length constraints to ensure reliable visual tracking, enabling intelligent on-the-fly cinematographic planning. Extending beyond UAVs, a long-term 2D visual tracking framework is developed to handle common challenges such as occlusions, fast motion, and temporary target disappearance, allowing recovery without tracker re-initialization by dynamically adjusting the tracking model based on occlusion severity. To further enhance adaptability, an adversarial learning approach is proposed where the tracker functions as a generator guided by a discriminator that evaluates response map consistency with a target distribution, improving model precision while remaining lightweight enough for embedded systems. Additionally, a Robust Tracking Module (RTM) is introduced to increase resilience against input noise by applying image-to-image translation, standardizing input conditions and mitigating performance degradation under visual distortions. The effectiveness of this module is validated through an evaluation toolkit designed to benchmark tracking robustness across different noise types. Finally, a hierarchical reward based Reinforcement Learning (RL) framework is proposed that allows robotic systems to learn motion policies to optimize the performance of computer vision methods such as optical character recognition and face recognition. Together, these contributions deliver a comprehensive vision framework that improves the stability, adaptability, and reliability of visual tracking, with broad applicability across domains such as robotics and autonomous systems, surveillance, and smart vehicles, while retaining special relevance to the challenges of UAV-based autonomous cinematography.
Ιάσων Ευάγγελος Β. Καρακώστας (Thu,) studied this question.