Project 59: VLM-Based Robot Task and Motion Planning

Contact Information

Prof. Wang Hesheng

Project Description and Objectives

Conventional Task and Motion Planning (TAMP) approaches rely on manually designed interfaces connecting symbolic task planning with continuous motion generation in robot applications. These domain-specific and labor-intensive modules are limited in addressing emerging tasks in real-world robot settings. The rapid development of large vision language models (VLMs) offers new possibilities for addressing TAMP problems in a zero-shot manner, without the need for manual scheme adaptation by providing additional human demonstrations. The program aims to leverage the powerful visual grounding and generalization reasoning capabilities of VLMs to achieve zero-shot TAMP tasks in real-world environments. To achieve this goal, students are required to study the fundamental knowledge related to VLMs, as well as conduct experimental validations with real-world robot settings.

Eligibility Requirements

Python, C++, ROS

Main Tasks

Designing a novel paradigm for task composition and motion trajectoriy generation with VLMs in real-world robot manipulation.

Conducting extensive experiments to demonstrate that the proposed VLM-based TAMP approach can achieve high success rates for multiple robot settings.

Website

Lab: https://irmv.sjtu.edu.cn/

School: http://english.seiee.sjtu.edu.cn/