LLM-Driven Autonomous Robotic Manipulation

RAS 545 · Final Project · Arizona State University · Fall 2025

Conversational robot control using Google Gemini 2.5 Flash: natural language commands drive a Dobot Magician Lite through a tool-augmented LLM architecture with vision-guided pick-and-place, RANSAC affine calibration, and HSV color segmentation — achieving 92% manipulation success across 50 trials.

Overview

This system bridges high-level natural language intent with low-level robot hardware using a 16-tool LLM agent powered by Google Gemini 2.5 Flash. The user types or speaks commands like "Pick up the small blue block and place it in the box on the right" — the LLM reasons over the scene, calls the appropriate tools, and the Dobot executes the physical task autonomously.

Key results:

92% pick-and-place success (47/50 grasps, 46/47 placements)
94.3% object detection reliability (66/70 blocks across 10 trials)
98.7% LLM function call accuracy (152/154 correct tool invocations)
2.87mm RMS calibration error across 300mm × 200mm workspace

Repository Structure

.
├── LLM_ROBOT.py                        # Main entry — conversational AI control loop
├── call_function.py                    # Tool dispatcher — 16+ function declarations
├── config.py                           # Central config — affine matrix, Z-heights, ports
├── affine_transfrom.py                 # RANSAC affine calibration (pixel → robot coords)
├── get_pixel_coords.py                 # Interactive calibration tool (click-to-capture)
├── Robot_Tools/
│   ├── Robot_Motion_Tools.py           # Low-level Dobot motion primitives
│   ├── Camera_Capture_Tools.py         # HSV color segmentation + blob detection
│   └── Pick_Place_Tool.py              # High-level 8-waypoint pick-and-place
├── Helper_Functions/
│   └── file_handling.py                # File I/O tools for LLM
├── capture_scene.json                  # Runtime scene memory (detected block locations)
├── Ras_FinalProject_Report.pdf         # Full project report (IEEE format)
└── README.md

System Architecture

User (natural language)
        ↓
Gemini 2.5 Flash — LLM Agent (up to 20 tool calls per input)
        ↓
call_function.py — Tool dispatcher
        ↓
┌───────────────────────────────────────┐
│  File Ops  │  Robot Motion  │  Vision │
│            │                │         │
│ get_files  │ move_to_home   │ capture │
│ read_file  │ pick_and_place │ detect  │
│ write_file │ suction_on/off │ segment │
└───────────────────────────────────────┘
        ↓
Dobot Magician Lite (serial /dev/ttyACM2)

Technical Details

Vision-Robot Calibration

9-point RANSAC affine transformation maps pixel coordinates to robot workspace:

M = [[ 0.00601, -0.48421, 380.653],
     [-0.46908,  0.00375, 155.350]]

Metric	Value
Calibration points	9
Workspace coverage	300mm × 200mm
RMS error	2.87mm
Mean error	2.65mm
Max error	4.21mm
RANSAC iterations	1000

Computer Vision Pipeline

HSV color segmentation detects colored blocks:

Color	Hue range	Notes
Blue	H ∈ [100°, 130°]	Most reliable
Green	H ∈ [40°, 80°]	—
Yellow	H ∈ [20°, 40°]	Sensitive to warm lighting

Pipeline: RGB → HSV → threshold → morphological opening (5×5) → connected components → centroid → capture_scene.json

Motion Planning

8-waypoint pick-and-place trajectory with Z-stratified collision avoidance:

Parameter	Value
`z_above`	100 mm (safe travel)
`z_table`	−45 mm (grasp contact)
`block_height`	40 mm
`stack_delta`	10 mm (stacking margin)
`side_offset`	10 mm (side-by-side)
Motion mode	`MOVJ_XYZ`
Velocity	50 mm/s

LLM Integration

Metric	Value
Model	Gemini 2.5 Flash
Tools	16 specialized functions
Max tool calls per input	20
Function call accuracy	98.7% (152/154)
Mean inference time	1850ms ± 320ms
End-to-end task cycle	18.4s ± 2.1s

Setup & Usage

Requirements

pip install pydobot opencv-python google-generativeai numpy

Configuration

Update config.py with your calibrated affine matrix and device paths:

M = np.array([[0.00601, -0.48421, 380.653],
              [-0.46908,  0.00375, 155.350]])

default_port   = "/dev/ttyACM2"
camera_index   = 4

Calibrate (first time)

python get_pixel_coords.py   # collect correspondence points
python affine_transfrom.py   # compute affine matrix

Run

python LLM_ROBOT.py

Example prompts:

Pick up the small blue block and place it in the box on the right.
Stack the green block on top of the blue block.
Move blue1 next to yellow1.
Pick up all blocks and sort them by color.

Results

Task	Success Rate
Pick-and-place (50 trials)	92% (46/50)
Object detection (10 trials)	94.3% (66/70)
Explicit commands (6 tests)	100%
Incomplete command clarification	100%
Ambiguous command interpretation	75%
3-block tower building (10 trials)	80% complete towers

Lessons Learned

Tool-augmented LLMs provide safe, interpretable robot control without custom training or fine-tuning
RANSAC affine calibration achieves sufficient accuracy (2.87mm RMS) for planar workspaces
HSV segmentation is fast but lighting-sensitive — warm lighting shifts yellow detection range
Z-stratified motion planning eliminates most collision failures with minimal complexity
LLM latency (1.85s) dominates conversational lag — local quantized models needed for industrial use

Course Info

Course: RAS 545 — Robotic and Autonomous Systems
Instructor: Prof. Sangram Redkar
University: Arizona State University, Tempe AZ
Semester: Fall 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM-Driven Autonomous Robotic Manipulation

Overview

Repository Structure

System Architecture

Technical Details

Vision-Robot Calibration

Computer Vision Pipeline

Motion Planning

LLM Integration

Setup & Usage

Requirements

Configuration

Calibrate (first time)

Run

Results

Lessons Learned

Course Info

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
Helper_Functions		Helper_Functions
Robot_Tools		Robot_Tools
__pycache__		__pycache__
captures		captures
.env		.env
LLM_ROBOT.py		LLM_ROBOT.py
README.md		README.md
affine_transfrom.py		affine_transfrom.py
call_function.py		call_function.py
config.py		config.py
get_pixel_coords.py		get_pixel_coords.py

Folders and files

Latest commit

History

Repository files navigation

LLM-Driven Autonomous Robotic Manipulation

Overview

Repository Structure

System Architecture

Technical Details

Vision-Robot Calibration

Computer Vision Pipeline

Motion Planning

LLM Integration

Setup & Usage

Requirements

Configuration

Calibrate (first time)

Run

Results

Lessons Learned

Course Info

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages