Skip to content

KonNik88/audio-similarity-tagging-hub

Repository files navigation

AudioHub — Audio Similarity & Retrieval (MLOps + Kubernetes)

Python FastAPI Qdrant Prometheus Grafana Loki MLflow Airflow Docker Kubernetes License: MIT

AudioHub is a production-style demo of an audio similarity search system built on top of precomputed deep audio embeddings, a vector database (Qdrant), and a full MLOps/observability stack.

This repository focuses on engineering, reproducibility, deployment and observability — not on training a new audio model.


Project Scope

This project demonstrates:

  • Semantic audio similarity search using frozen CNN14 (PANNs) embeddings
  • ANN vector search in Qdrant (HNSW, cosine distance)
  • Retrieval evaluation (Recall@K, label overlap)
  • Latency tuning (HNSW ef parameter)
  • MLflow experiment tracking
  • Prometheus metrics + Grafana dashboards
  • Loki logging via Promtail
  • Airflow demo DAG orchestration
  • Docker Compose deployment
  • Kubernetes (Minikube) manifests

Important Design Decision

The FastAPI service accepts a precomputed embedding vector (dim=2048) as input.

This is intentional:

  • Keeps inference lightweight
  • Makes retrieval deterministic
  • Focuses the project on vector search and MLOps infrastructure
  • Avoids shipping heavy audio encoders in runtime

Data & Embeddings

Dataset: FSD50K (~51k clips, multi-label)

Embeddings:

  • Model: CNN14 (PANNs)
  • Shape: (51,197, 2048)
  • dtype: float32
  • Preprocessing: 32kHz mono, 5s chunking, mean pooling

Large audio files and embedding matrices are not stored in git.


Retrieval Results

  • Recall@1 ≈ 0.78
  • Recall@5 ≈ 0.91
  • Recall@10 ≈ 0.94

HNSW tuning:

  • ef tested in {16, 32, 64, 128, 256, 512}
  • Recall saturates early
  • Best trade-off: ef = 32

Artifacts:

  • artifacts/retrieval/qdrant_tuning/tuning_results.csv
  • artifacts/retrieval/qdrant_tuning/best_config.json

Architecture

Runtime flow:

Client (embedding vector) -> FastAPI (/retrieve) -> Qdrant ANN search -> Prometheus metrics -> Loki logs -> Optional MLflow logging


Docker Compose Stack

Services:

  • audiohub-api (FastAPI)
  • qdrant
  • prometheus
  • grafana
  • loki
  • promtail
  • mlflow
  • airflow (webserver + scheduler)

Start:

cd docker
docker compose up -d --build

Health checks:

curl http://localhost:6333/readyz
curl http://localhost:9090/-/ready
curl http://localhost:3100/ready

Swagger:

http://localhost:8002/docs

Airflow Demo DAG

DAG: audiohub_demo_run

Pipeline steps:

  • wait_ready
  • benchmark retrieval
  • collect Prometheus snapshot
  • collect Loki snapshot
  • log run to MLflow
  • build markdown report

Trigger:

docker exec -it audiohub-airflow-webserver airflow dags trigger audiohub_demo_run

Artifacts saved in: artifacts/demo_runs/<RUN_ID>/


Kubernetes Deployment (Minikube)

Namespace: audiohub

Apply manifests:

kubectl apply -f k8s/

Check pods:

kubectl -n audiohub get pods

Validate API inside cluster:

kubectl -n audiohub run curl-test --rm -i --restart=Never       --image=curlimages/curl --       curl -sS http://audiohub-api:8002/health

Loki query example:

{namespace="audiohub"}

Promtail + Kubernetes Lessons Learned

Porting Loki/Promtail to Kubernetes (Minikube + docker runtime) required:

  • Correct relabeling of host
  • Proper path including pod UID and container name
  • Mounting /var/log and /var/lib/docker/containers
  • Using docker pipeline stage (not cri)
  • Understanding that readinessProbe fails if 0 targets found

This repo contains the final working manifests under k8s/promtail/.


Roadmap (Optional)

  • Two-stage retrieval (Qdrant top-K + reranker)
  • Load testing (k6/locust)
  • Optional embedding service (audio -> vector)
  • Helm/Kustomize packaging

License

MIT — see LICENSE.

About

Universal audio embeddings + tagging + similarity search with Streamlit demo and FastAPI; PANNs/YAMNet, FAISS/Qdrant, Grad-CAM.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors