Robotic manipulation in contact-rich environments requires integrating vision and tactile sensing, yet effective fusion of these modalities remains challenging. Existing methods often adopt symmetric feature fusion or joint policy learning, implicitly treating tactile as a continuous motion generator comparable to vision. However, this misaligns with the nature of touch, which primarily provides physical constraints and phase cues rather than action proposals. In this work, we propose DPTG, a framework that reformulates tactile sensing as a physical feasibility constraint rather than a parallel action generator. Actions are sampled from a vision-driven diffusion policy and guided by a tactile feasibility classifier. To extract phase-relevant cues, we derive an adaptive guidance schedule from feasibility scores, selectively activating constraints only when contact is informative. Moreover, the feasibility classifier is trained as a standalone module using interaction signals, enabling classifier reuse across related tasks that share the same action space and tactile setup. Experiments in simulation and the real world demonstrate that DPTG improves success rates while reducing peak forces, leading to safer and more stable contact interactions than baselines.
Liu et al. (Wed,) studied this question.