Sketch-based robot navigation provides an intuitive alternative to traditional control methods, allowing users to express complex spatial intent efficiently. This thesis addresses the novel problem of translating hand-drawn sketches into viable 3D drone flight paths when the sketches are drawn over first-personview depth images, making interpretation challenging. The proposed method, SketchPlan, a diffusion-based planner, is composed of two distinct modules: SketchAdapter, which learns to interpret human sketches into meaningful 2D path projections, and DiffPath, a diffusion model that infers accurate and collision-aware 3D paths from these projections using depth images. A novel synthetic dataset containing 32,000 simulated drone flight paths was generated using photorealistic 3D Gaussian Splatting environments, supplemented by a smaller set of human-annotated sketches. This partially-labeled training approach helps to bridge the gap between human intent and automatically generated path data. Experimental evaluation demonstrates that SketchPlan successfully generalizes from simulation to real-world drone navigation. In an unseen simulated environment, it significantly outperforms baseline ablations, reducing collision rates and path distance. In real-world testing, SketchPlan achieves a 100% success rate in low- to medium-clutter scenarios and 40% in highly cluttered environments, demonstrating robust capability in understanding and executing user intent. The results highlight the potential of diffusion-based models in interactive robot navigation tasks, enabling real-time, intuitive human-robot interaction. This thesis opens new avenues for leveraging sketch-based control across various robotic applications, particularly where conventional instruction methods fall short.
Sixten Norelius (Wed,) studied this question.