openHSU logo
Log In(current)
  1. Home
  2. Helmut-Schmidt-University / University of the Federal Armed Forces Hamburg
  3. Publications
  4. 1 - Initial full text publications (except theses)
  5. Natural language robot manipulation via MCP

Natural language robot manipulation via MCP

An integrated framework for vision-guided pick-and-place automation
Publication date
2026-05-07
Document type
Konferenzbeitrag
Author
Gaida, Daniel
Nordhoff, Tim Yago
Organisational unit
TH Köln
DOI
10.24405/23185
URI
https://openhsu.ub.hsu-hh.de/handle/10.24405/23185
Conference
9th ML4CPS 2026 – Machine Learning for Cyber-Physical Systems  
Publisher
Universitätsbibliothek der HSU/UniBw H
Book title
Machine learning for cyber physical systems : proceedings of the conference ML4CPS 2026
First page
40
Last page
49
Is part of
https://openhsu.ub.hsu-hh.de/handle/10.24405/23181
Is supplemented by
https://github.com/dgaida/robot_mcp
Peer-reviewed
✅
Part of the university bibliography
Nein
File(s)
openHSU_23185.pdf (1.81 MB)
Additional Information
Language
English
Keyword
Model context protocol
Natural-Language robot control
Computer Vision
Pick-and-place robotics
Cyber-physical systems
Abstract
This paper presents a unified framework for natural-language control of robotic manipulators based on the Model Context Protocol (MCP). The system integrates a large language model (LLM) with real-time perception, spatial reasoning, and robot execution, enabling users to command robots through unconstrained natural-language instructions. High-level requests are interpreted by an LLM with structured tool-calling capabilities and translated into executable actions provided by a modular set of MCP tools. The tools interface with a real-time environment layer that manages perception, world modelling, and manipulation, while platform-agnostic controllers enable deployment on multiple robot arms and support multimodal interaction through graphical user interface (GUI), speech, and command line interfaces.
Using a NIRYO Ned2 robot arm we evaluate the system on a diverse set of manipulation tasks requiring object identification, spatial reasoning, and multi-step action execution. Experiments demonstrate that the approach achieves reliable task completion despite challenges such as ambiguous object references and visually similar objects. The results highlight the feasibility of combining LLM-based reasoning with classical perception and control for robust, language-driven manipulation. All tools, controllers, and environment components are made publicly available at https://github.com/dgaida/robot_mcp.
Description
This contribution is part of the conference proceedings, which are licensed under the terms of the Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/)
Version
Published version
Access right on openHSU
Open access

  • Privacy policy
  • Send Feedback
  • Imprint