This repository is a simple implementation of Depth Anything V2 with ONNX and TensorRT. The model is converted from the original PyTorch model and can be used for image and video depth estimation.
git clone https://github.com/DepthAnything/Depth-Anything-V2pip install -r requirements.txtTensorRT version:
torch==1.13.0+cu114torchvision==0.14.0+cu114pycuda==2022.2.2tensorrt==8.5.2.2JetPack 5.0
Download the pre-trained model from Depth-Anything-V2 and put it under the Depth-Anything-V2/checkpoints directory.
python export.py --encoder vits --input-size 518python onnx2trt.py -o models/depth_anything_v2_vits.onnx --output depth_anything_v2_vits.engine --workspace 2Or you can download the converted model from Google Drive and put it under the models directory.
python infer.py
--input-path assets/demo01.jpg --input-type image \
--mode onnx --encoder vits \
--model_path models/depth_anything_v2_vits.onnxFocus on a region with crop region:
python infer.py
--input-path assets/demo01.jpg --input-type image \
--mode onnx --encoder vits \
--model_path depth_anything_v2_vits.onnx \
--crop-region "0 550 800 750"Options:
--input-path: path to input image--input-type: input type,imageorvideo--mode: inference mode,onnxortrt--encoder: encoder type,vits,vitb,vitl,vitg--model_path: path to model file--crop-region: crop region,x y w h--output-path: path to output image--grayscale: output grayscale image
python app.pyURL: http://127.0.0.1:7860
The easiest way to deploy is using Docker Compose, which automatically converts the ONNX model to TensorRT on first startup:
- Download the pre-trained model and convert to ONNX:
mkdir -p Depth-Anything-V2/checkpoints
wget -O Depth-Anything-V2/checkpoints/depth_anything_v2_vits.pth \
"https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth"
python export.py --encoder vits --input-size 518- Copy the ONNX model to the model repository:
cp models/depth_anything_v2_vits.onnx* model_repository/depth_anything/1/- Start the Triton server (auto-converts to TensorRT on first run):
docker compose up -d- Check the logs to monitor conversion progress:
docker compose logs -fThe server will be available at:
- HTTP:
http://localhost:8000 - gRPC:
localhost:8001 - Metrics:
http://localhost:8002
mkdir -p model_repository/depth_anything/1
trtexec --onnx=models/depth_anything_v2_vits.onnx --saveEngine=model_repository/depth_anything/1/model.plan --fp16polygraphy inspect model model_repository/depth_anything/1/model.planCreate config.pbtxt under the model_repository/depth_anything directory.
name: "depth_anything"
platform: "tensorrt_plan"
default_model_filename: "model.plan"
max_batch_size : 0
input [
{
name: "input"
data_type: TYPE_FP32
dims: [ 1, 3, 518, 518 ]
}
]
output [
{
name: "output"
data_type: TYPE_FP32
dims: [ 1, 518, 518 ]
}
]
instance_group [ { count: 1, kind: KIND_GPU }]- Edge devices with NVIDIA GPU
sudo docker build -t tritonserver:v1 .
sudo docker run --runtime=nvidia --gpus all -p 8000:8000 -p 8001:8001 -p 8002:8002 \
-v $(pwd)/model_repository:/models tritonserver:v1- Server with NVIDIA GPU (using Docker Compose)
docker compose up -dThe compose.yml uses nvcr.io/nvidia/tritonserver:25.01-py3 which includes TensorRT 10.8 with support for newer GPUs (including RTX 40/50 series with Compute Capability 8.9+/12.0).
python3 depth_anything_triton_infer.py --input_path assets/demo01.jpg --client_type http --model_name depth_anythingNote: Update the server_url parameter in the script or class initialization to match your server address (default is localhost).
Inside the container, run the following command to benchmark the model:
perf_analyzer -m depth_anything --shape input:1,3,518,518 --percentile=95 --concurrency-range 1:4
# or for dynamic model
perf_analyzer -m depth_anything_dynamic --shape input:480,960,3 --percentile=95 --concurrency-range 1:4DeepStream SDK provides a high-performance video analytics pipeline for running depth estimation on video streams with optimized GPU utilization.
- NVIDIA DeepStream SDK 8.0+ (DeepStream container recommended)
- TensorRT engine file (
.engine) - NVIDIA GPU with compute capability 5.0+
The simplest way to run depth estimation with DeepStream:
# Create output directory
mkdir -p output
# Run depth estimation on an image
docker run --rm \
--device=/dev/nvidia0 \
--device=/dev/nvidiactl \
--device=/dev/nvidia-uvm \
--device=/dev/nvidia-uvm-tools \
--device=/dev/nvidia-modeset \
-v ~/nvidia-tools:/nvidia:ro \
-e LD_LIBRARY_PATH=/nvidia:/usr/local/cuda/lib64 \
-v $(pwd):/workspace \
-w /workspace/deepstream \
nvcr.io/nvidia/deepstream:8.0-triton-multiarch \
python3 deepstream_depth.py -i /workspace/assets/demo01.jpg -o /workspace/output/depth_output.jpgThe deepstream/deepstream_depth.py script provides a simple GStreamer pipeline for depth estimation:
# Basic usage
python3 deepstream_depth.py -i <input_image> -o <output_image>
# With custom config
python3 deepstream_depth.py -i input.jpg -o output.jpg -c config_infer_depth.txtOptions:
| Option | Description | Default |
|---|---|---|
-i, --input |
Input image file (JPEG) | Required |
-o, --output |
Output image file | output/depth_output.jpg |
-c, --config |
nvinfer config file | config_infer_depth.txt |
The deepstream/ directory contains:
| File | Description |
|---|---|
config_infer_depth.txt |
nvinfer configuration for depth model |
deepstream_depth_config.txt |
Full DeepStream app configuration |
deepstream_depth.py |
Python script with GStreamer pipeline |
Key parameters in config_infer_depth.txt:
| Parameter | Value | Description |
|---|---|---|
network-mode |
2 | FP16 precision (0=FP32, 1=INT8, 2=FP16) |
model-engine-file |
../models/depth_anything_v2_vits.engine |
TensorRT engine path |
infer-dims |
3;518;518 | Input dimensions (CHW) |
output-tensor-meta |
1 | Output raw tensor for processing |
net-scale-factor |
0.00392156862745098 | 1/255 normalization |
offsets |
123.675;116.28;103.53 | ImageNet mean values |
# Convert ONNX to TensorRT engine
trtexec --onnx=models/depth_anything_v2_vits.onnx \
--saveEngine=models/depth_anything_v2_vits.engine \
--fp16 \
--memPoolSize=workspace:2048MFor more complex pipelines with multiple sources:
docker run --rm \
--device=/dev/nvidia0 --device=/dev/nvidiactl \
--device=/dev/nvidia-uvm --device=/dev/nvidia-uvm-tools \
-v $(pwd):/workspace \
-w /workspace/deepstream \
nvcr.io/nvidia/deepstream:8.0-triton-multiarch \
deepstream-app -c deepstream_depth_config.txt| Method | Framework | Batch Size | Latency (ms) | Throughput |
|---|---|---|---|---|
| Triton Server | TensorRT | 1 | ~5-8 | ~125-200 FPS |
| DeepStream | TensorRT/nvinfer | 1 | ~6-10 | ~100-166 FPS |
| Direct TensorRT | TensorRT | 1 | ~4-6 | ~166-250 FPS |
Benchmarks on RTX 5090, FP16, 518x518 input. Actual performance varies by hardware.
Notes:
- DeepStream adds overhead for video pipeline management but excels at multi-stream processing
- Triton provides better scalability for serving multiple clients
- Direct TensorRT has lowest latency for single-image inference
- Add UI for easy usage with crop region
- Deploy on Triton Inference Server
- Deploy with DeepStream SDK
@article{depth_anything_v2,
title={Depth Anything V2},
author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
journal={arXiv:2406.09414},
year={2024}
}Reference: Depth-Anythingv2-TensorRT-python








