Lessons from Ordeq: Simplifying Kedro #5178

DimedS · 2025-10-27T15:54:28Z

DimedS
Oct 27, 2025
Collaborator

This is a summary of the Kedro Tech Design session (22 Oct 2025) led by Simon Brugman.

🧭 Summary

In this Tech Design, @sbrugman , a long-time Kedro contributor and Technical Steering Committee member, presented his experience building ORDEQ — a lightweight pipeline framework.
Simon shared insights using Kedro in production and explained how ORDEQ re-imagines Kedro’s core ideas around pipelines, datasets, and catalogs with a focus on simplicity, minimal dependencies, and a Python-first developer experience.

The discussion centred on what Kedro can learn from ORDEQ’s design — especially in reducing complexity, improving usability, and making deployment smoother in large enterprise environments.

🏦 Kedro at One Company: From Wide Adoption to Decline

At One Company, Kedro was initially adopted widely across data science and engineering teams — over 10 data scientists and multiple squads used it to structure their machine learning and data pipelines.
It became the de facto standard for reproducible experimentation, thanks to its clear project structure, modular pipelines, and strong integration with Kedro-Viz.

However, over time, adoption declined significantly, particularly on the engineering and production side.
While Kedro worked well for research and prototyping, it became hard to maintain and deploy in One Company’s highly controlled, cloud-based (GCP) environment.

🧩 From Kedro to ORDEQ: Closing the Gaps

To overcome these challenges, One Company decided to build their own lightweight framework — ORDEQ — inspired by Kedro’s core ideas but redesigned for simplicity and production reliability.
ORDEQ keeps Kedro’s strengths (pipelines, modularity, reproducibility) but removes or reimagines the parts that caused friction in day-to-day use.

Below are the key Kedro pain points and how ORDEQ addressed them:

🧱 Heavy dependencies → Zero-dependency core
- Kedro’s 40+ dependencies (Click, Jinja2, YAML, fsspec, etc.) made it slow to install and hard to deploy in restricted environments.
- ORDEQ was built with zero runtime dependencies by default — users can extend it only when needed.
⚙️ Complex dataset abstraction → Simple Pythonic classes
- Kedro datasets required subclassing AbstractDataset and handling filesystem protocols, versioning, and hooks.
- ORDEQ replaced this with minimal dataset classes (simple dataclasses with load() / save()), usually under 10 lines of code.
📁 YAML configuration → Pure Python catalog
- Kedro’s catalog.yml and conf/base/parameters.yml files added indirection and were hard to navigate.
- ORDEQ defines datasets directly in Python, e.g.:
```
from ordeq_pandas import PandasCSV
companies = PandasCSV(path="data/companies.csv")
```
  This allows autocompletion, refactoring, and type checking.
🚀 CLI and Session coupling → Python-first runner
- Kedro requires running through the CLI (kedro run) or a KedroSession.
- ORDEQ pipelines can run as plain Python objects:
```
pipeline.run() -?
```
  making integration into notebooks, APIs, and orchestrators trivial.
🧩 Hard Airflow/GCP integration → Direct orchestration compatibility
- One Company struggled with Kedro-Airflow conversions and dependency management on GCP.
- ORDEQ exposes simple Python interfaces that can be directly imported into Airflow DAGs or other orchestration systems without conversion layers.
🧠 Learning curve → Minimal API surface
- Kedro’s abstractions (hooks, datasets, catalog, runners) are powerful but intimidating for small teams.
- ORDEQ focuses on explicit Python over convention, making onboarding easier for both data scientists and engineers.
📊 Visualization → Retain Kedro-Viz compatibility
- ORDEQ still exports pipelines in a JSON format compatible with Kedro-Viz (and even supports Mermaid diagrams), preserving the strong visual debugging workflow.

DimedS · 2025-10-27T15:54:40Z

DimedS
Oct 27, 2025
Collaborator Author

🧱 1. Heavy dependencies. Proposal

The topic of Kedro’s large dependency footprint has a long history of discussion — most recently summarised in #3967.

In the recent Tech Design, Simon Brugman clearly demonstrated that the number of mandatory dependencies can become a serious blocker for production deployment in large organisations, eventually limiting Kedro adoption.

As highlighted, there is a clear functional split between:

Runtime usage (kedro run) – used in production or deployment environments
Development usage (kedro new, kedro pipeline create, etc.) – used by developers to build Kedro projects

Most heavy dependencies are only required for the latter, not for running pipelines.

🚀 Proposal direction

To start a gradual effort towards dependency separation, in three phases:

Step 1 – Make the dependency boundaries explicit (non-breaking)

In the next minor release, update pyproject.toml to clearly separate runtime vs development dependencies.
This can be done by introducing optional dependency groups ([core], [all], etc.) purely as a signal, without actually changing what gets installed by default.
This step is completely non-breaking and helps establish a clear dependency map for future changes.

Step 2 – Discuss swap defaults before Kedro 2.0

Once the boundaries are defined and documented, consider flipping the default install behaviour:

pip install kedro → minimal runtime (suitable for kedro run)
pip install kedro[all] → full developer experience

This would reduce dependency load in production, while remaining easy to use for developers.
Such a change would be mildly breaking, so it should be introduced with clear upgrade guidance.

Step 3 – Split packages in Kedro 2.0

For Kedro 2.0, explore introducing a formal split, following the model used by other Python ecosystems:

kedro-core – minimal runtime layer
kedro – meta-package depending on kedro-core and including developer tools

0 replies

yetudada · 2025-10-29T12:19:19Z

yetudada
Oct 29, 2025
Maintainer

I wanted to share a quick reflection from the conversation around Ordeq. Some of the ideas Simon mentioned; such as promoting a pure Python catalog, adopting a Python-first runner approach, and reducing the API surface by removing the framework layer; are all directions that have existed in Kedro’s past.

If you look back at Kedro 0.13 (before it was open-sourced), it actually looked very similar to what Ordeq is proposing now. I’d really encourage you to dig up the internal documentation or code from that version if you can, since it provides a useful reference point.

This raises an interesting question: did we make a mistake in moving Kedro toward being more framework-like, rather than staying closer to a lightweight library model? It might be worth revisiting that trade-off, even as a quick prototype, to see what that earlier simplicity could look like today.

0 replies

astrojuanlu · 2025-11-04T18:24:55Z

astrojuanlu
Nov 4, 2025

I'd like to use this opportunity to cross-link some not-so-old ideas that we had about this 😉 #3659 sad that I did a terrible job at selling the idea, but happy that the project is determined to take this direction.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Lessons from Ordeq: Simplifying Kedro #5178

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Lessons from Ordeq: Simplifying Kedro #5178

Uh oh!

DimedS Oct 27, 2025 Collaborator

🧭 Summary

🏦 Kedro at One Company: From Wide Adoption to Decline

🧩 From Kedro to ORDEQ: Closing the Gaps

Replies: 3 comments

Uh oh!

DimedS Oct 27, 2025 Collaborator Author

🧱 1. Heavy dependencies. Proposal

🚀 Proposal direction

Step 1 – Make the dependency boundaries explicit (non-breaking)

Step 2 – Discuss swap defaults before Kedro 2.0

Step 3 – Split packages in Kedro 2.0

Uh oh!

yetudada Oct 29, 2025 Maintainer

Uh oh!

astrojuanlu Nov 4, 2025

DimedS
Oct 27, 2025
Collaborator

DimedS
Oct 27, 2025
Collaborator Author

yetudada
Oct 29, 2025
Maintainer

astrojuanlu
Nov 4, 2025