In the field of embodied intelligence, the effective integration of vision and touch is essential for achieving dexterous manipulation. To address the limitations of vision-only perception in perceiving contact states and material properties, we propose DeepTouch, a vision-tactile fusion framework integrating visuo-tactile perception, simulated and real data acquisition, and a vision-tactile-language-action (VTLA) control strategy. It provides a concise technical framework for refined dexterous manipulation.
Li et al. (Sat,) studied this question.