Skip to content

Commit e0a5bc5

Browse files
committed
refactor: unify embedding providers via LanceDB registry
- Replace BaseEmbeddingProvider/ModelSpec with LanceDB registry-based EmbeddingSpec - Add embed-anything and hyper-models as @register'd LanceDB embedding functions - Make model parameter required in compute_embeddings() with auto-detection - Add list_embedding_providers() and get_provider_info() to public API - Use uv workspace for hyper-models dependency (local development) - Add CONTRIBUTING.md with development setup guide - Update docs to use full HuggingFace model IDs - Smarter default geometry selection in launch() auto-layout
1 parent 2291a0b commit e0a5bc5

21 files changed

Lines changed: 747 additions & 1000 deletions

CONTRIBUTING.md

Lines changed: 82 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,82 @@
1+
# Contributing to HyperView
2+
3+
We welcome contributions! This guide will help you set up your development environment and submit high-quality pull requests.
4+
5+
## Development Setup
6+
7+
### Requirements
8+
- **Python 3.10+**
9+
- **uv** (Package manager)
10+
- **Node.js** (For frontend development)
11+
12+
### One-time Setup
13+
Clone the repo and install dependencies:
14+
15+
```bash
16+
git clone https://github.com/Hyper3Labs/HyperView.git
17+
cd HyperView
18+
19+
# Create virtual environment and install dev dependencies
20+
uv venv .venv
21+
source .venv/bin/activate
22+
uv pip install -e ".[dev]"
23+
24+
# Install frontend dependencies
25+
cd frontend
26+
npm install
27+
cd ..
28+
```
29+
30+
## Running Locally
31+
32+
For the best development experience, run the backend and frontend in separate terminals.
33+
34+
### 1. Start the Backend
35+
Runs the Python API server at `http://127.0.0.1:6262`.
36+
37+
```bash
38+
uv run hyperview demo --samples 200 --no-browser
39+
```
40+
41+
_Tip: Use `HF_DATASETS_OFFLINE=1` if you have cached datasets and want to work offline._
42+
43+
### 2. Start the Frontend
44+
Runs the Next.js dev server at `http://127.0.0.1:3000` with hot reloading.
45+
46+
```bash
47+
cd frontend
48+
npm run dev
49+
```
50+
51+
The frontend automatically proxies API requests (`/api/*`) to the backend.
52+
53+
## Common Tasks
54+
55+
### Testing & Linting
56+
Please ensure all checks pass before submitting a PR.
57+
58+
```bash
59+
# Python
60+
uv run pytest # Run unit tests
61+
uv run ruff format . # formatting
62+
uv run ruff check . --fix # Linting
63+
64+
# Frontend
65+
cd frontend
66+
npm run lint
67+
```
68+
69+
### Exporting the Frontend
70+
The Python package bundles the compiled frontend. If you modify the frontend, you must regenerate the static assets so they can be served by the Python backend in production/demos.
71+
72+
```bash
73+
bash scripts/export_frontend.sh
74+
```
75+
_This compiles the frontend and places artifacts into `src/hyperview/server/static/`. Do not edit files in that directory manually._
76+
77+
## Pull Request Guidelines
78+
79+
1. **Scope**: Keep changes focused. Open an issue first for major refactors or new features.
80+
2. **Tests**: Add tests for new logic where practical.
81+
3. **Visuals**: If changing the UI, please attach a screenshot or GIF to your PR.
82+
4. **Format**: Ensure code is formatted with `ruff` (Python) and `prettier` (JS/TS implicit in `npm run lint`).

README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,7 +68,7 @@ dataset.add_from_huggingface(
6868
# dataset.add_images_dir("/path/to/images", label_from_folder=True)
6969

7070
# Compute embeddings and visualization
71-
dataset.compute_embeddings()
71+
dataset.compute_embeddings(model="openai/clip-vit-base-patch32")
7272
dataset.compute_visualization()
7373

7474
# Launch the UI

docs/datasets.md

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -63,8 +63,8 @@ Samples are **never implicitly deleted**. Use `hv.Dataset.delete("name")` for ex
6363
## Computing Embeddings
6464

6565
```python
66-
# High-dimensional embeddings (CLIP/ResNet)
67-
dataset.compute_embeddings(model="clip", show_progress=True)
66+
# High-dimensional embeddings (CLIP)
67+
dataset.compute_embeddings(model="openai/clip-vit-base-patch32", show_progress=True)
6868

6969
# 2D projections for visualization
7070
dataset.compute_visualization() # UMAP to Euclidean + Hyperbolic

notebooks/demo.ipynb

Lines changed: 0 additions & 194 deletions
This file was deleted.

pyproject.toml

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,7 @@ dependencies = [
2626
"fastapi>=0.128.0",
2727
"uvicorn[standard]>=0.40.0",
2828
"embed-anything>=0.7.0",
29-
"hyper-models @ git+https://github.com/Hyper3Labs/hyper-models.git@7489595f4f665802671136872b2bf61794995e1b",
29+
"hyper-models>=0.1.0", # PyPI package: https://pypi.org/project/hyper-models/
3030
"numpy>=1.26.4,<2.4",
3131
"umap-learn>=0.5.11",
3232
"pillow>=12.1.0",
@@ -88,3 +88,9 @@ ignore = ["E501"]
8888
[tool.pytest.ini_options]
8989
asyncio_mode = "auto"
9090
testpaths = ["tests"]
91+
92+
[tool.uv.workspace]
93+
members = ["hyper_models"]
94+
95+
[tool.uv.sources]
96+
hyper-models = { workspace = true }

scripts/demo.py

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -68,11 +68,11 @@ def main():
6868
max_samples=args.samples,
6969
)
7070

71-
dataset.compute_embeddings(model=args.model, show_progress=True)
71+
space_key = dataset.compute_embeddings(model=args.model, show_progress=True)
7272

73-
# Compute both euclidean and poincare layouts
74-
dataset.compute_visualization(geometry="euclidean")
75-
dataset.compute_visualization(geometry="poincare")
73+
# Compute a single layout for the UI to display by default.
74+
# Switch to geometry="euclidean" for standard 2D UMAP.
75+
dataset.compute_visualization(space_key=space_key, geometry="poincare")
7676

7777
if args.no_server:
7878
return

scripts/demo_hyperbolic_clip.py

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,16 @@
11
#!/usr/bin/env python
2-
"""Demo: CLIP (Euclidean) + HyCoCLIP (Poincaré) on CIFAR-100."""
2+
"""Demo: CLIP (Euclidean) + hyper-models (Poincaré) on CIFAR-100."""
33

44
import hyperview as hv
5-
import hyperview.embeddings.providers.hycoclip # noqa: F401
6-
from hyperview.embeddings.providers import ModelSpec
75

8-
DATASET_NAME = "cifar100_clip_hycoclip"
6+
DATASET_NAME = "cifar100_clip_hyper_models"
97
HF_DATASET = "uoft-cs/cifar100"
108
HF_SPLIT = "test"
119
HF_IMAGE_KEY = "img"
1210
HF_LABEL_KEY = "fine_label"
1311
NUM_SAMPLES = 200
1412
CLIP_MODEL_ID = "openai/clip-vit-base-patch32"
15-
HYCOCLIP_MODEL_ID = "hycoclip_vit_s"
13+
HYPER_MODELS_MODEL_ID = "hycoclip-vit-s"
1614

1715

1816
def main() -> None:
@@ -29,9 +27,8 @@ def main() -> None:
2927

3028
clip_space = dataset.compute_embeddings(CLIP_MODEL_ID)
3129
dataset.compute_visualization(space_key=clip_space, geometry="euclidean")
32-
hycoclip_spec = ModelSpec(provider="hycoclip", model_id=HYCOCLIP_MODEL_ID)
33-
hycoclip_space = dataset.compute_embeddings(hycoclip_spec)
34-
dataset.compute_visualization(space_key=hycoclip_space, geometry="poincare")
30+
hyper_space = dataset.compute_embeddings(model=HYPER_MODELS_MODEL_ID)
31+
dataset.compute_visualization(space_key=hyper_space, geometry="poincare")
3532

3633
print("Launching at http://127.0.0.1:6262")
3734

src/hyperview/__init__.py

Lines changed: 13 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,18 @@
11
"""HyperView - Open-source dataset curation with hyperbolic embeddings visualization."""
22

33
from hyperview.api import Dataset, launch
4+
from hyperview.embeddings.engine import (
5+
EmbeddingSpec,
6+
get_provider_info,
7+
list_embedding_providers,
8+
)
49

510
__version__ = "0.1.0"
6-
__all__ = ["Dataset", "launch", "__version__"]
11+
__all__ = [
12+
"Dataset",
13+
"EmbeddingSpec",
14+
"get_provider_info",
15+
"launch",
16+
"list_embedding_providers",
17+
"__version__",
18+
]

0 commit comments

Comments
 (0)