Skip to content

KonNik88/multimodal-retrieval-lab

Repository files navigation

Multimodal Retrieval Lab

Production-style multimodal retrieval system built with OpenCLIP + Qdrant + FastAPI + Streamlit.


Badges

Python PyTorch OpenCLIP Qdrant FastAPI Streamlit License


Overview

Multimodal Retrieval Lab is an end-to-end image–text retrieval system supporting:

  • Text → Image retrieval
  • Image → Image similarity search
  • Retrieval evaluation (Recall@K, MRR@K, nDCG@K)
  • Latency profiling (p50 / p95 / mean)
  • Fully containerized deployment (Docker Compose)

Core Stack

Layer Technology
Embeddings OpenCLIP (ViT-B/32, OpenAI weights)
Vector DB Qdrant (Cosine similarity, REST API)
Backend FastAPI
Frontend Streamlit
Deployment Docker Compose

Benchmark Results (Flickr8k)

Evaluation performed on test split (full image index).

Metric Value
Recall@1 0.294
Recall@5 0.533
Recall@10 0.631
MRR@10 0.397
nDCG@10 0.452

Latency (CPU Docker demo)

Stage p50 p95
Encode ~9 ms ~11 ms
Search ~24 ms ~39 ms
End-to-End ~33 ms ~48 ms

First request may take ~1–2 minutes due to OpenCLIP model warm-up.


Project Structure

MultimodalNN/
│
├── src/
│   ├── embeddings/
│   ├── qdrant/
│   ├── search/
│   ├── eval/
│   ├── api/
│   └── ui/
│
├── notebooks/
│   └── 01_flickr8k_end2end.ipynb
│
├── docker/
│   ├── api/Dockerfile
│   └── ui/Dockerfile
│
├── docker-compose.yml
├── requirements.txt
├── requirements-dev.txt
└── README.md

🐳 Run with Docker

docker compose up -d --build

Services

Service URL
UI http://localhost:8508
API http://localhost:8008/health
Qdrant Dashboard http://localhost:6334/dashboard

API Endpoints

Health

GET /health

Text → Image

POST /search_text
{
  "query": "a dog running on the grass",
  "top_k": 5
}

Image → Image

POST /search_image
(form-data: image file)

Similar by image_id

GET /similar_image/{image_id}?top_k=6

Reproducing Indexing & Evaluation

  1. Download Flickr8k dataset
  2. Place images in:
data/flickr8k/images/
  1. Run notebook:
notebooks/01_flickr8k_end2end.ipynb

Notebook performs:

  • Dataset preparation
  • CLIP embedding generation
  • Qdrant indexing
  • Retrieval evaluation
  • Latency benchmarking
  • Artifact export

🎯 Project Goals

  • Multimodal embeddings engineering
  • Vector search architecture
  • Retrieval evaluation methodology
  • Modular ML system design
  • Deployable ML stack

📄 License

MIT License


Built for ML Engineering Portfolio • 2026

About

Production-ready multimodal retrieval system built with OpenCLIP, Qdrant, FastAPI and Streamlit. Includes full evaluation pipeline (Recall@K, mAP, nDCG) and Docker-based deployment.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages