Depth Anything V2 - ONNX and TensorRT Triton Inference Server

Introduction

This repository is a simple implementation of Depth Anything V2 with ONNX and TensorRT. The model is converted from the original PyTorch model and can be used for image and video depth estimation.

Installation

git clone https://github.com/DepthAnything/Depth-Anything-V2

pip install -r requirements.txt

TensorRT version:

torch==1.13.0+cu114
torchvision==0.14.0+cu114
pycuda==2022.2.2
tensorrt==8.5.2.2
JetPack 5.0

Convert model

ONNX

Download the pre-trained model from Depth-Anything-V2 and put it under the Depth-Anything-V2/checkpoints directory.

python export.py --encoder vits --input-size 518

TensorRT

python onnx2trt.py -o models/depth_anything_v2_vits.onnx --output depth_anything_v2_vits.engine --workspace 2

Or you can download the converted model from Google Drive and put it under the models directory.

Usage

python infer.py 
  --input-path assets/demo01.jpg --input-type image \
  --mode onnx --encoder vits \
  --model_path models/depth_anything_v2_vits.onnx

Focus on a region with crop region:

python infer.py 
  --input-path assets/demo01.jpg --input-type image \
  --mode onnx --encoder vits \ 
  --model_path depth_anything_v2_vits.onnx \
  --crop-region "0 550 800 750"

Options:

--input-path: path to input image
--input-type: input type, image or video
--mode: inference mode, onnx or trt
--encoder: encoder type, vits, vitb, vitl, vitg
--model_path: path to model file
--crop-region: crop region, x y w h
--output-path: path to output image
--grayscale: output grayscale image

User Interface

python app.py

URL: http://127.0.0.1:7860

Preview crop zone
UI with crop zone and the output depth map

Deploy on Triton Inference Server

Quick Start (Recommended)

The easiest way to deploy is using Docker Compose, which automatically converts the ONNX model to TensorRT on first startup:

Download the pre-trained model and convert to ONNX:

mkdir -p Depth-Anything-V2/checkpoints
wget -O Depth-Anything-V2/checkpoints/depth_anything_v2_vits.pth \
  "https://huggingface.co/depth-anything/Depth-Anything-V2-Small/resolve/main/depth_anything_v2_vits.pth"
python export.py --encoder vits --input-size 518

Copy the ONNX model to the model repository:

cp models/depth_anything_v2_vits.onnx* model_repository/depth_anything/1/

Start the Triton server (auto-converts to TensorRT on first run):

docker compose up -d

Check the logs to monitor conversion progress:

docker compose logs -f

The server will be available at:

HTTP: http://localhost:8000
gRPC: localhost:8001
Metrics: http://localhost:8002

Manual Setup

Convert model to TensorRT plan and save it to the model repository

mkdir -p model_repository/depth_anything/1
trtexec --onnx=models/depth_anything_v2_vits.onnx --saveEngine=model_repository/depth_anything/1/model.plan --fp16

Check I/O of model and create the model configuration

polygraphy inspect model model_repository/depth_anything/1/model.plan

Create config.pbtxt under the model_repository/depth_anything directory.

name: "depth_anything"
platform: "tensorrt_plan"
default_model_filename: "model.plan"
max_batch_size : 0
input [
  {
    name: "input"
    data_type: TYPE_FP32
    dims: [ 1, 3, 518, 518 ]
  }
]
output [
  {
    name: "output"
    data_type: TYPE_FP32
    dims: [ 1, 518, 518 ]
  }
]
instance_group [ { count: 1, kind: KIND_GPU }]

Build and run the Triton Inference Server container

Edge devices with NVIDIA GPU

sudo docker build -t tritonserver:v1 .
sudo docker run --runtime=nvidia --gpus all -p 8000:8000 -p 8001:8001 -p 8002:8002 \
  -v $(pwd)/model_repository:/models tritonserver:v1

Server with NVIDIA GPU (using Docker Compose)

docker compose up -d

The compose.yml uses nvcr.io/nvidia/tritonserver:25.01-py3 which includes TensorRT 10.8 with support for newer GPUs (including RTX 40/50 series with Compute Capability 8.9+/12.0).

Inference

python3 depth_anything_triton_infer.py --input_path assets/demo01.jpg --client_type http --model_name depth_anything

Note: Update the server_url parameter in the script or class initialization to match your server address (default is localhost).

Performance Benchmark

Inside the container, run the following command to benchmark the model:

perf_analyzer -m depth_anything --shape input:1,3,518,518 --percentile=95 --concurrency-range 1:4

# or for dynamic model
perf_analyzer -m depth_anything_dynamic --shape input:480,960,3 --percentile=95 --concurrency-range 1:4

Deploy with DeepStream SDK

DeepStream SDK provides a high-performance video analytics pipeline for running depth estimation on video streams with optimized GPU utilization.

Prerequisites

NVIDIA DeepStream SDK 8.0+ (DeepStream container recommended)
TensorRT engine file (.engine)
NVIDIA GPU with compute capability 5.0+

Quick Start with Docker (Recommended)

The simplest way to run depth estimation with DeepStream:

# Create output directory
mkdir -p output

# Run depth estimation on an image
docker run --rm \
  --device=/dev/nvidia0 \
  --device=/dev/nvidiactl \
  --device=/dev/nvidia-uvm \
  --device=/dev/nvidia-uvm-tools \
  --device=/dev/nvidia-modeset \
  -v ~/nvidia-tools:/nvidia:ro \
  -e LD_LIBRARY_PATH=/nvidia:/usr/local/cuda/lib64 \
  -v $(pwd):/workspace \
  -w /workspace/deepstream \
  nvcr.io/nvidia/deepstream:8.0-triton-multiarch \
  python3 deepstream_depth.py -i /workspace/assets/demo01.jpg -o /workspace/output/depth_output.jpg

DeepStream Python Script

The deepstream/deepstream_depth.py script provides a simple GStreamer pipeline for depth estimation:

# Basic usage
python3 deepstream_depth.py -i <input_image> -o <output_image>

# With custom config
python3 deepstream_depth.py -i input.jpg -o output.jpg -c config_infer_depth.txt

Options:

Option	Description	Default
`-i, --input`	Input image file (JPEG)	Required
`-o, --output`	Output image file	`output/depth_output.jpg`
`-c, --config`	nvinfer config file	`config_infer_depth.txt`

DeepStream Pipeline Configuration

The deepstream/ directory contains:

File	Description
`config_infer_depth.txt`	nvinfer configuration for depth model
`deepstream_depth_config.txt`	Full DeepStream app configuration
`deepstream_depth.py`	Python script with GStreamer pipeline

nvinfer Configuration Parameters

Key parameters in config_infer_depth.txt:

Parameter	Value	Description
`network-mode`	2	FP16 precision (0=FP32, 1=INT8, 2=FP16)
`model-engine-file`	`../models/depth_anything_v2_vits.engine`	TensorRT engine path
`infer-dims`	3;518;518	Input dimensions (CHW)
`output-tensor-meta`	1	Output raw tensor for processing
`net-scale-factor`	0.00392156862745098	1/255 normalization
`offsets`	123.675;116.28;103.53	ImageNet mean values

Converting Model for DeepStream

# Convert ONNX to TensorRT engine
trtexec --onnx=models/depth_anything_v2_vits.onnx \
        --saveEngine=models/depth_anything_v2_vits.engine \
        --fp16 \
        --memPoolSize=workspace:2048M

Using deepstream-app (Alternative)

For more complex pipelines with multiple sources:

docker run --rm \
  --device=/dev/nvidia0 --device=/dev/nvidiactl \
  --device=/dev/nvidia-uvm --device=/dev/nvidia-uvm-tools \
  -v $(pwd):/workspace \
  -w /workspace/deepstream \
  nvcr.io/nvidia/deepstream:8.0-triton-multiarch \
  deepstream-app -c deepstream_depth_config.txt

Performance Comparison

Method	Framework	Batch Size	Latency (ms)	Throughput
Triton Server	TensorRT	1	~5-8	~125-200 FPS
DeepStream	TensorRT/nvinfer	1	~6-10	~100-166 FPS
Direct TensorRT	TensorRT	1	~4-6	~166-250 FPS

Benchmarks on RTX 5090, FP16, 518x518 input. Actual performance varies by hardware.

Notes:

DeepStream adds overhead for video pipeline management but excels at multi-stream processing
Triton provides better scalability for serving multiple clients
Direct TensorRT has lowest latency for single-image inference

TODO

Add UI for easy usage with crop region
Deploy on Triton Inference Server
Deploy with DeepStream SDK

Citation

@article{depth_anything_v2,
  title={Depth Anything V2},
  author={Yang, Lihe and Kang, Bingyi and Huang, Zilong and Zhao, Zhen and Xu, Xiaogang and Feng, Jiashi and Zhao, Hengshuang},
  journal={arXiv:2406.09414},
  year={2024}
}

Reference: Depth-Anythingv2-TensorRT-python

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Depth Anything V2 - ONNX and TensorRT Triton Inference Server

Introduction

Installation

Convert model

ONNX

TensorRT

Usage

User Interface

Deploy on Triton Inference Server

Quick Start (Recommended)

Manual Setup

Convert model to TensorRT plan and save it to the model repository

Check I/O of model and create the model configuration

Build and run the Triton Inference Server container

Inference

Performance Benchmark

Deploy with DeepStream SDK

Prerequisites

Quick Start with Docker (Recommended)

DeepStream Python Script

DeepStream Pipeline Configuration

nvinfer Configuration Parameters

Converting Model for DeepStream

Using deepstream-app (Alternative)

Performance Comparison

TODO

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors 1

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Depth-Anything-V2		Depth-Anything-V2
assets		assets
deepstream		deepstream
model_repository/depth_anything		model_repository/depth_anything
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
compose.yml		compose.yml
depth_anything_triton_infer.py		depth_anything_triton_infer.py
entrypoint.sh		entrypoint.sh
export.py		export.py
infer.py		infer.py
onnx2trt.py		onnx2trt.py
requirements.txt		requirements.txt
utils.py		utils.py

Folders and files

Latest commit

History

Repository files navigation

Depth Anything V2 - ONNX and TensorRT Triton Inference Server

Introduction

Installation

Convert model

ONNX

TensorRT

Usage

User Interface

Deploy on Triton Inference Server

Quick Start (Recommended)

Manual Setup

Convert model to TensorRT plan and save it to the model repository

Check I/O of model and create the model configuration

Build and run the Triton Inference Server container

Inference

Performance Benchmark

Deploy with DeepStream SDK

Prerequisites

Quick Start with Docker (Recommended)

DeepStream Python Script

DeepStream Pipeline Configuration

nvinfer Configuration Parameters

Converting Model for DeepStream

Using deepstream-app (Alternative)

Performance Comparison

TODO

Citation

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors 1

Languages

Packages