Skip to content

vishal-sasikumar/llm-robot-manipulation

Repository files navigation

LLM-Driven Autonomous Robotic Manipulation

RAS 545 · Final Project · Arizona State University · Fall 2025

Conversational robot control using Google Gemini 2.5 Flash: natural language commands drive a Dobot Magician Lite through a tool-augmented LLM architecture with vision-guided pick-and-place, RANSAC affine calibration, and HSV color segmentation — achieving 92% manipulation success across 50 trials.


Overview

This system bridges high-level natural language intent with low-level robot hardware using a 16-tool LLM agent powered by Google Gemini 2.5 Flash. The user types or speaks commands like "Pick up the small blue block and place it in the box on the right" — the LLM reasons over the scene, calls the appropriate tools, and the Dobot executes the physical task autonomously.

Key results:

  • 92% pick-and-place success (47/50 grasps, 46/47 placements)
  • 94.3% object detection reliability (66/70 blocks across 10 trials)
  • 98.7% LLM function call accuracy (152/154 correct tool invocations)
  • 2.87mm RMS calibration error across 300mm × 200mm workspace

Repository Structure

.
├── LLM_ROBOT.py                        # Main entry — conversational AI control loop
├── call_function.py                    # Tool dispatcher — 16+ function declarations
├── config.py                           # Central config — affine matrix, Z-heights, ports
├── affine_transfrom.py                 # RANSAC affine calibration (pixel → robot coords)
├── get_pixel_coords.py                 # Interactive calibration tool (click-to-capture)
├── Robot_Tools/
│   ├── Robot_Motion_Tools.py           # Low-level Dobot motion primitives
│   ├── Camera_Capture_Tools.py         # HSV color segmentation + blob detection
│   └── Pick_Place_Tool.py              # High-level 8-waypoint pick-and-place
├── Helper_Functions/
│   └── file_handling.py                # File I/O tools for LLM
├── capture_scene.json                  # Runtime scene memory (detected block locations)
├── Ras_FinalProject_Report.pdf         # Full project report (IEEE format)
└── README.md

System Architecture

User (natural language)
        ↓
Gemini 2.5 Flash — LLM Agent (up to 20 tool calls per input)
        ↓
call_function.py — Tool dispatcher
        ↓
┌───────────────────────────────────────┐
│  File Ops  │  Robot Motion  │  Vision │
│            │                │         │
│ get_files  │ move_to_home   │ capture │
│ read_file  │ pick_and_place │ detect  │
│ write_file │ suction_on/off │ segment │
└───────────────────────────────────────┘
        ↓
Dobot Magician Lite (serial /dev/ttyACM2)

Technical Details

Vision-Robot Calibration

9-point RANSAC affine transformation maps pixel coordinates to robot workspace:

M = [[ 0.00601, -0.48421, 380.653],
     [-0.46908,  0.00375, 155.350]]
Metric Value
Calibration points 9
Workspace coverage 300mm × 200mm
RMS error 2.87mm
Mean error 2.65mm
Max error 4.21mm
RANSAC iterations 1000

Computer Vision Pipeline

HSV color segmentation detects colored blocks:

Color Hue range Notes
Blue H ∈ [100°, 130°] Most reliable
Green H ∈ [40°, 80°]
Yellow H ∈ [20°, 40°] Sensitive to warm lighting

Pipeline: RGB → HSV → threshold → morphological opening (5×5) → connected components → centroid → capture_scene.json

Motion Planning

8-waypoint pick-and-place trajectory with Z-stratified collision avoidance:

Parameter Value
z_above 100 mm (safe travel)
z_table −45 mm (grasp contact)
block_height 40 mm
stack_delta 10 mm (stacking margin)
side_offset 10 mm (side-by-side)
Motion mode MOVJ_XYZ
Velocity 50 mm/s

LLM Integration

Metric Value
Model Gemini 2.5 Flash
Tools 16 specialized functions
Max tool calls per input 20
Function call accuracy 98.7% (152/154)
Mean inference time 1850ms ± 320ms
End-to-end task cycle 18.4s ± 2.1s

Setup & Usage

Requirements

pip install pydobot opencv-python google-generativeai numpy

Configuration

Update config.py with your calibrated affine matrix and device paths:

M = np.array([[0.00601, -0.48421, 380.653],
              [-0.46908,  0.00375, 155.350]])

default_port   = "/dev/ttyACM2"
camera_index   = 4

Calibrate (first time)

python get_pixel_coords.py   # collect correspondence points
python affine_transfrom.py   # compute affine matrix

Run

python LLM_ROBOT.py

Example prompts:

Pick up the small blue block and place it in the box on the right.
Stack the green block on top of the blue block.
Move blue1 next to yellow1.
Pick up all blocks and sort them by color.

Results

Task Success Rate
Pick-and-place (50 trials) 92% (46/50)
Object detection (10 trials) 94.3% (66/70)
Explicit commands (6 tests) 100%
Incomplete command clarification 100%
Ambiguous command interpretation 75%
3-block tower building (10 trials) 80% complete towers

Lessons Learned

  • Tool-augmented LLMs provide safe, interpretable robot control without custom training or fine-tuning
  • RANSAC affine calibration achieves sufficient accuracy (2.87mm RMS) for planar workspaces
  • HSV segmentation is fast but lighting-sensitive — warm lighting shifts yellow detection range
  • Z-stratified motion planning eliminates most collision failures with minimal complexity
  • LLM latency (1.85s) dominates conversational lag — local quantized models needed for industrial use

Course Info

  • Course: RAS 545 — Robotic and Autonomous Systems
  • Instructor: Prof. Sangram Redkar
  • University: Arizona State University, Tempe AZ
  • Semester: Fall 2025

About

RAS 545 Final · LLM-driven robot manipulation using Gemini 2.5 Flash + vision-guided pick-and-place · Dobot Magician Lite · ASU Fall 2025

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages