You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
DataFrames displayed in nteract notebooks currently render as text/html tables (pandas) or text/plain (fallback). We want them to render as interactive, filterable, sortable tables using sift (@nteract/sift) — a fast dataframe viewer built on pretext + WASM (100k+ rows at 120fps).
Architecture
Data flow: Kernel → RuntimeAgent → Daemon → Frontend
Key insight: Parquet bytes bypass IOPub entirely. The kernel writes directly to the blob store (same filesystem as agent), then emits a lightweight JSON reference on IOPub. The agent's normal manifest pipeline picks up the reference. The frontend resolves the blob URL and sift's WASM decodes the parquet client-side.
The agent's blob store is the same filesystem directory — direct writes have no protocol overhead
Display data on IOPub is just {"blob_hash": "..."} — a few bytes
MIME type
application/vnd.nteract.dataframe+parquet
The data field is a JSON string with the blob hash. All schema metadata (columns, types, row count) lives inside the parquet file itself — parquet is self-describing. No separate metadata needed.
Components
1. Repo structure: monorepo package at packages/sift/
Add sift as a pnpm workspace package. pnpm-workspace.yaml already has packages/*. The nteract-predicate WASM crate joins the Cargo workspace. Sift's standalone dev workflow (cd packages/sift && pnpm dev) is preserved for fast iteration.
2. Frontend: DataFrameOutput component
Register custom MIME type in MediaProvider (same pattern as widget-view in App.tsx)
Add to MAIN_DOM_SAFE_TYPES (sift is pure DOM, no script execution risk)
Follow the pattern pandas uses for application/vnd.dataresource+json (pandas/io/formats/printing.py:302):
Register a custom IPython formatter for our MIME type
Handles both pandas.DataFrame and polars.DataFrame (by type, not method)
On display: df.to_parquet(buf) → write to blob store → return {blob_hash} + text/plain fallback
Non-invasive: adds alongside existing text/html, other frontends fall back gracefully
Research findings:
pandas: has _repr_html_(), no _repr_mimebundle_(). Has precedent for custom MIME formatters via IPython's display_formatter.formatters
polars: has _repr_html_() only, no _repr_mimebundle_(). Our formatter registers by type so it works
4. Output widget / iframe handling
DataFrames inside ipywidgets.Output render in iframes. Start with a hybrid: sift in main DOM, fall back to text/html in iframe contexts (the isInIframe() check already exists in MediaRouter). Revisit iframe sift rendering later if needed.
5. Daemon: no changes needed
Blob store is content-addressed and media-type-agnostic. The custom MIME type's data is JSON text → normal manifest pipeline.
Phasing
Phase 1: Sift in monorepo + frontend renderer
Copy sift source into packages/sift/
Add nteract-predicate to Cargo workspace, wire up WASM build
WASM in Tauri bundle — nteract-predicate.wasm needs to be in the app assets. Copy pipeline TBD.
Formatter auto-registration — agent-injected startup code vs IPython extension vs kernel spec hook?
Max DataFrame size — UX for exceeding 100MB blob limit? Truncate? Warn?
Remote kernels — blob upload API is abstract (blob_store.upload()) so transport can change for SSH agents (feat(runtimed): SSH remote runtimes #1334). Not solving now.
Problem
DataFrames displayed in nteract notebooks currently render as
text/htmltables (pandas) ortext/plain(fallback). We want them to render as interactive, filterable, sortable tables using sift (@nteract/sift) — a fast dataframe viewer built on pretext + WASM (100k+ rows at 120fps).Architecture
Data flow: Kernel → RuntimeAgent → Daemon → Frontend
Key insight: Parquet bytes bypass IOPub entirely. The kernel writes directly to the blob store (same filesystem as agent), then emits a lightweight JSON reference on IOPub. The agent's normal manifest pipeline picks up the reference. The frontend resolves the blob URL and sift's WASM decodes the parquet client-side.
Why out-of-band (not IOPub)
{"blob_hash": "..."}— a few bytesMIME type
application/vnd.nteract.dataframe+parquetThe
datafield is a JSON string with the blob hash. All schema metadata (columns, types, row count) lives inside the parquet file itself — parquet is self-describing. No separate metadata needed.Components
1. Repo structure: monorepo package at
packages/sift/Add sift as a pnpm workspace package.
pnpm-workspace.yamlalready haspackages/*. Thenteract-predicateWASM crate joins the Cargo workspace. Sift's standalone dev workflow (cd packages/sift && pnpm dev) is preserved for fast iteration.2. Frontend:
DataFrameOutputcomponentMediaProvider(same pattern as widget-view inApp.tsx)MAIN_DOM_SAFE_TYPES(sift is pure DOM, no script execution risk)DEFAULT_PRIORITYabovetext/htmlDataFrameOutputresolves blob URL via blob port, renders<SiftTable url={blobUrl} />3. Python: IPython display formatter + blob upload
Follow the pattern pandas uses for
application/vnd.dataresource+json(pandas/io/formats/printing.py:302):pandas.DataFrameandpolars.DataFrame(by type, not method)df.to_parquet(buf)→ write to blob store → return{blob_hash}+text/plainfallbacktext/html, other frontends fall back gracefullyResearch findings:
_repr_html_(), no_repr_mimebundle_(). Has precedent for custom MIME formatters via IPython'sdisplay_formatter.formatters_repr_html_()only, no_repr_mimebundle_(). Our formatter registers by type so it works4. Output widget / iframe handling
DataFrames inside
ipywidgets.Outputrender in iframes. Start with a hybrid: sift in main DOM, fall back totext/htmlin iframe contexts (theisInIframe()check already exists in MediaRouter). Revisit iframe sift rendering later if needed.5. Daemon: no changes needed
Blob store is content-addressed and media-type-agnostic. The custom MIME type's data is JSON text → normal manifest pipeline.
Phasing
Phase 1: Sift in monorepo + frontend renderer
packages/sift/nteract-predicateto Cargo workspace, wire up WASM buildDataFrameOutputcomponent + MIME registrationPhase 2: Python formatter + blob upload (end-to-end)
blob.upload()in runtimed (direct filesystem write)pd.DataFrame({"x": range(100_000)})→ sift tablePhase 3: Polish
text/plain+text/htmlfallbacksOpen questions
nteract-predicate.wasmneeds to be in the app assets. Copy pipeline TBD.blob_store.upload()) so transport can change for SSH agents (feat(runtimed): SSH remote runtimes #1334). Not solving now.Related