RAS 545 · Final Project · Arizona State University · Fall 2025
Conversational robot control using Google Gemini 2.5 Flash: natural language commands drive a Dobot Magician Lite through a tool-augmented LLM architecture with vision-guided pick-and-place, RANSAC affine calibration, and HSV color segmentation — achieving 92% manipulation success across 50 trials.
This system bridges high-level natural language intent with low-level robot hardware using a 16-tool LLM agent powered by Google Gemini 2.5 Flash. The user types or speaks commands like "Pick up the small blue block and place it in the box on the right" — the LLM reasons over the scene, calls the appropriate tools, and the Dobot executes the physical task autonomously.
Key results:
- 92% pick-and-place success (47/50 grasps, 46/47 placements)
- 94.3% object detection reliability (66/70 blocks across 10 trials)
- 98.7% LLM function call accuracy (152/154 correct tool invocations)
- 2.87mm RMS calibration error across 300mm × 200mm workspace
.
├── LLM_ROBOT.py # Main entry — conversational AI control loop
├── call_function.py # Tool dispatcher — 16+ function declarations
├── config.py # Central config — affine matrix, Z-heights, ports
├── affine_transfrom.py # RANSAC affine calibration (pixel → robot coords)
├── get_pixel_coords.py # Interactive calibration tool (click-to-capture)
├── Robot_Tools/
│ ├── Robot_Motion_Tools.py # Low-level Dobot motion primitives
│ ├── Camera_Capture_Tools.py # HSV color segmentation + blob detection
│ └── Pick_Place_Tool.py # High-level 8-waypoint pick-and-place
├── Helper_Functions/
│ └── file_handling.py # File I/O tools for LLM
├── capture_scene.json # Runtime scene memory (detected block locations)
├── Ras_FinalProject_Report.pdf # Full project report (IEEE format)
└── README.md
User (natural language)
↓
Gemini 2.5 Flash — LLM Agent (up to 20 tool calls per input)
↓
call_function.py — Tool dispatcher
↓
┌───────────────────────────────────────┐
│ File Ops │ Robot Motion │ Vision │
│ │ │ │
│ get_files │ move_to_home │ capture │
│ read_file │ pick_and_place │ detect │
│ write_file │ suction_on/off │ segment │
└───────────────────────────────────────┘
↓
Dobot Magician Lite (serial /dev/ttyACM2)
9-point RANSAC affine transformation maps pixel coordinates to robot workspace:
M = [[ 0.00601, -0.48421, 380.653],
[-0.46908, 0.00375, 155.350]]
| Metric | Value |
|---|---|
| Calibration points | 9 |
| Workspace coverage | 300mm × 200mm |
| RMS error | 2.87mm |
| Mean error | 2.65mm |
| Max error | 4.21mm |
| RANSAC iterations | 1000 |
HSV color segmentation detects colored blocks:
| Color | Hue range | Notes |
|---|---|---|
| Blue | H ∈ [100°, 130°] | Most reliable |
| Green | H ∈ [40°, 80°] | — |
| Yellow | H ∈ [20°, 40°] | Sensitive to warm lighting |
Pipeline: RGB → HSV → threshold → morphological opening (5×5) → connected components → centroid → capture_scene.json
8-waypoint pick-and-place trajectory with Z-stratified collision avoidance:
| Parameter | Value |
|---|---|
z_above |
100 mm (safe travel) |
z_table |
−45 mm (grasp contact) |
block_height |
40 mm |
stack_delta |
10 mm (stacking margin) |
side_offset |
10 mm (side-by-side) |
| Motion mode | MOVJ_XYZ |
| Velocity | 50 mm/s |
| Metric | Value |
|---|---|
| Model | Gemini 2.5 Flash |
| Tools | 16 specialized functions |
| Max tool calls per input | 20 |
| Function call accuracy | 98.7% (152/154) |
| Mean inference time | 1850ms ± 320ms |
| End-to-end task cycle | 18.4s ± 2.1s |
pip install pydobot opencv-python google-generativeai numpyUpdate config.py with your calibrated affine matrix and device paths:
M = np.array([[0.00601, -0.48421, 380.653],
[-0.46908, 0.00375, 155.350]])
default_port = "/dev/ttyACM2"
camera_index = 4python get_pixel_coords.py # collect correspondence points
python affine_transfrom.py # compute affine matrixpython LLM_ROBOT.pyExample prompts:
Pick up the small blue block and place it in the box on the right.
Stack the green block on top of the blue block.
Move blue1 next to yellow1.
Pick up all blocks and sort them by color.
| Task | Success Rate |
|---|---|
| Pick-and-place (50 trials) | 92% (46/50) |
| Object detection (10 trials) | 94.3% (66/70) |
| Explicit commands (6 tests) | 100% |
| Incomplete command clarification | 100% |
| Ambiguous command interpretation | 75% |
| 3-block tower building (10 trials) | 80% complete towers |
- Tool-augmented LLMs provide safe, interpretable robot control without custom training or fine-tuning
- RANSAC affine calibration achieves sufficient accuracy (2.87mm RMS) for planar workspaces
- HSV segmentation is fast but lighting-sensitive — warm lighting shifts yellow detection range
- Z-stratified motion planning eliminates most collision failures with minimal complexity
- LLM latency (1.85s) dominates conversational lag — local quantized models needed for industrial use
- Course: RAS 545 — Robotic and Autonomous Systems
- Instructor: Prof. Sangram Redkar
- University: Arizona State University, Tempe AZ
- Semester: Fall 2025