Multimodal Retrieval Lab

Production-style multimodal retrieval system built with OpenCLIP + Qdrant + FastAPI + Streamlit.

Badges

Overview

Multimodal Retrieval Lab is an end-to-end image–text retrieval system supporting:

Text → Image retrieval
Image → Image similarity search
Retrieval evaluation (Recall@K, MRR@K, nDCG@K)
Latency profiling (p50 / p95 / mean)
Fully containerized deployment (Docker Compose)

Core Stack

Layer	Technology
Embeddings	OpenCLIP (ViT-B/32, OpenAI weights)
Vector DB	Qdrant (Cosine similarity, REST API)
Backend	FastAPI
Frontend	Streamlit
Deployment	Docker Compose

Benchmark Results (Flickr8k)

Evaluation performed on test split (full image index).

Metric	Value
Recall@1	0.294
Recall@5	0.533
Recall@10	0.631
MRR@10	0.397
nDCG@10	0.452

Latency (CPU Docker demo)

Stage	p50	p95
Encode	~9 ms	~11 ms
Search	~24 ms	~39 ms
End-to-End	~33 ms	~48 ms

First request may take ~1–2 minutes due to OpenCLIP model warm-up.

Project Structure

MultimodalNN/
│
├── src/
│   ├── embeddings/
│   ├── qdrant/
│   ├── search/
│   ├── eval/
│   ├── api/
│   └── ui/
│
├── notebooks/
│   └── 01_flickr8k_end2end.ipynb
│
├── docker/
│   ├── api/Dockerfile
│   └── ui/Dockerfile
│
├── docker-compose.yml
├── requirements.txt
├── requirements-dev.txt
└── README.md

🐳 Run with Docker

docker compose up -d --build

Services

Service	URL
UI	http://localhost:8508
API	http://localhost:8008/health
Qdrant Dashboard	http://localhost:6334/dashboard

API Endpoints

Health

GET /health

Text → Image

POST /search_text
{
  "query": "a dog running on the grass",
  "top_k": 5
}

Image → Image

POST /search_image
(form-data: image file)

Similar by image_id

GET /similar_image/{image_id}?top_k=6

Reproducing Indexing & Evaluation

Download Flickr8k dataset
Place images in:

data/flickr8k/images/

Run notebook:

notebooks/01_flickr8k_end2end.ipynb

Notebook performs:

Dataset preparation
CLIP embedding generation
Qdrant indexing
Retrieval evaluation
Latency benchmarking
Artifact export

🎯 Project Goals

Multimodal embeddings engineering
Vector search architecture
Retrieval evaluation methodology
Modular ML system design
Deployable ML stack

📄 License

MIT License

Built for ML Engineering Portfolio • 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multimodal Retrieval Lab

Badges

Overview

Core Stack

Benchmark Results (Flickr8k)

Latency (CPU Docker demo)

Project Structure

🐳 Run with Docker

Services

API Endpoints

Health

Text → Image

Image → Image

Similar by image_id

Reproducing Indexing & Evaluation

🎯 Project Goals

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
docker		docker
notebooks		notebooks
src		src
.dockerignore		.dockerignore
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
multimodal_env.yml		multimodal_env.yml
requirements-dev.txt		requirements-dev.txt
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Multimodal Retrieval Lab

Badges

Overview

Core Stack

Benchmark Results (Flickr8k)

Latency (CPU Docker demo)

Project Structure

🐳 Run with Docker

Services

API Endpoints

Health

Text → Image

Image → Image

Similar by image_id

Reproducing Indexing & Evaluation

🎯 Project Goals

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages