[KEP 5] Kedro Inspection API #5405

ravi-kumar-pilla · 2026-02-26T00:47:43Z

ravi-kumar-pilla
Feb 26, 2026
Collaborator

Related issue: #5266
Author: @ravi-kumar-pilla
Shepherd: @rashidakanchwala , @jitu5

Context

Kedro's ecosystem has grown to include several plugins that need to understand a project's structure, its pipelines, nodes, datasets, and parameters, without executing any pipeline runs. Today, every plugin (KedroViz, vscode-kedro, kedro-airflow) independently reimplements this logic. Each one loads the project differently, makes different assumptions about the catalog, and has no guaranteed contract with Kedro's internals.

This has led to:

Duplicated inspection code across the ecosystem
Breakage when Kedro internals change (plugins tied to private APIs)
No single authoritative answer to "what does this project look like?"

A standardised inspection API in Kedro core establishes a single contract for this purpose.

What are we trying to do?

Provide a first-class Python API in kedro.inspection that returns a structured, serialisable snapshot of a Kedro project, its metadata, registered pipelines, nodes with their inputs/outputs, catalog dataset types, and parameter keys using only public Kedro APIs and without running any pipeline.

The API should be usable by any tool that needs to understand a Kedro project programmatically.

What this is NOT

Not a runtime execution API. get_snapshot() never loads datasets, runs nodes, or modifies project state.
Not a dependency-free inspection solution. The v1 API requires all project dependencies to be installed, the same prerequisite as kedro run. Solving dependency-free inspection is a significant architectural problem that affects Node, Pipeline, _ProjectPipelines, and find_pipelines(). This is tracked as a separate spike.
Not a CLI/REST feature in v1. CLI integration (kedro inspect) and REST endpoints are out of scope for the initial release. The v1 target integration point is the programmatic Python API only.
Not a graph representation. The snapshot captures node inputs/outputs as lists of dataset names. Graph edges are implicit and can be derived by consumers. Explicit graph structures are the concern of downstream tools.

Proposed Approach

Prerequisites

All project dependencies must be installed. get_snapshot() calls bootstrap_project() which triggers register_pipelines(), which imports all user pipeline modules. This is the same requirement as kedro run.

No KedroSession, no KedroContext

The inspection API does not create a KedroSession or KedroContext. These were designed for pipeline execution, not inspection, and bring overhead (hook registration, parameter materialisation, dataset instantiation) that is unnecessary for read-only structural queries. The API calls bootstrap_project() directly, then accesses _ProjectPipelines and OmegaConfigLoader independently.

Pydantic Models

class ProjectMetadataSnapshot(BaseModel):
    project_name: str
    package_name: str
    kedro_version: str

class NodeSnapshot(BaseModel):
    name: str
    namespace: str | None = None
    tags: list[str] = []
    inputs: list[str] = []
    outputs: list[str] = []

class PipelineSnapshot(BaseModel):
    id: str
    name: str
    nodes: list[NodeSnapshot]
    inputs: list[str] = []
    outputs: list[str] = []

class DatasetSnapshot(BaseModel):
    name: str
    type: str
    filepath: str | None = None
    
class ProjectSnapshot(BaseModel):
    metadata: ProjectMetadataSnapshot
    pipelines: list[PipelineSnapshot]
    datasets: dict[str, DatasetSnapshot]
    parameter_keys: list[str] = []

Public API Surface

from kedro.inspection import get_snapshot
from kedro.inspection.snapshot import (
    build_project_snapshot,
    build_metadata_snapshot,
    build_catalog_snapshot,
    build_pipeline_snapshots,
)
from kedro.inspection.models import (
    ProjectSnapshot,
    PipelineSnapshot,
    NodeSnapshot,
    DatasetSnapshot,
    ProjectMetadataSnapshot,
)

get_snapshot(project_path, env) → ProjectSnapshot

Primary entry point. Builds and returns a full ProjectSnapshot.

snapshot = get_snapshot("/path/to/project")
snapshot.model_dump_json(indent=2)

build_metadata_snapshot(metadata) → ProjectMetadataSnapshot

Converts the ProjectMetadata namedtuple returned by bootstrap_project().

build_catalog_snapshot(project_path, env) → tuple[dict[str, DatasetSnapshot], list[str]]

Reads catalog.yml and parameters.yml via OmegaConfigLoader + CatalogConfigResolver. No dataset classes instantiated.

build_pipeline_snapshots() → list[PipelineSnapshot]

Triggers register_pipelines() via _ProjectPipelines. Requires configure_project() to have been called (via bootstrap_project()).

build_project_snapshot(project_path, env) → ProjectSnapshot

Full orchestrator. Calls bootstrap_project() then assembles all snapshot components.

Potential Drawbacks

All pipeline deps required. Projects that do not have their virtual environment activated will receive an import error when pipeline data is requested. This is an explicitly accepted v1 limitation.

Timeline

For implementation: ~1-1.5weeks
Completion with reviews: ~2-2.5weeks

Appendix A: Module Structure

kedro/inspection/
    __init__.py      # Public API: get_snapshot, build_project_snapshot, model re-exports
    models.py        # Pydantic models, no kedro.framework imports
    snapshot.py      # Snapshot builders, no KedroSession or KedroContext

Dependency graph (no cycles):

kedro.inspection
    → kedro.inspection.models         (stdlib + pydantic only)
    → kedro.inspection.snapshot
        → kedro.framework.project     (_global_pipelines, settings)
        → kedro.framework.startup     (bootstrap_project)
        → kedro.config                (OmegaConfigLoader)
        → kedro.io                    (CatalogConfigResolver)

Appendix B: Rejected Designs

B1. Inspection via KedroSession/KedroContext

Loading a full KedroSession for inspection runs hook registration, instantiates datasets, and materialises parameters, all unnecessary for a read-only structural snapshot. Rejected in favour of direct bootstrap_project() + individual loader access.

B2. Separate kedro-inspect package

Keeps core lean but fragments the API across packages. Every plugin would need an additional dependency. Since the inspection models and builders are tightly coupled to Kedro internals, maintaining them in core is more reliable and gives better guarantees across Kedro releases. Rejected.

B3. Dependency-free inspection via AST scanning and module mocking

Prototyped in this spike: AST-scan pipeline files for imports, mock missing modules with MagicMock, then load pipeline_registry.py safely. Technically viable but more importantly, the root problem is architectural: why does any Kedro command other than kedro run need all pipeline dependencies installed? This is tracked as a separate spike (see dependency spike ticket) and is explicitly out of scope for v1.

Are you fine with the feature implementation and overall architecture ?

Please review the proposed approach and flag any concerns.

Please vote +1/−1 in comments

Thank you

datajoelypx · 2026-02-26T16:18:08Z

datajoelypx
Feb 26, 2026

+1

0 replies

noklam · 2026-02-26T22:58:17Z

noklam
Feb 26, 2026
Collaborator

Love the idea of having a core interface defined formally.

I am less sure about the naming of inspection, and the pydantic models (and using pydantic for interface). I have not spent too much time to think about it.

0 replies

rashidakanchwala · 2026-03-02T13:55:50Z

rashidakanchwala
Mar 2, 2026
Maintainer

+1,

based on Nok's comment... maybe we call it

kedro/snapshot/
    __init__.py      
    models.py        
    builder.py

0 replies

deepyaman · 2026-03-02T17:56:02Z

deepyaman
Mar 2, 2026
Collaborator

On the same page as @noklam.

Love the idea of having a core interface defined formally.

The fact that all of the individual plugins (both core and community-maintained) redefine the same things, and these depend on internal attributes and need to be maintained across Kedro version changes, is a huge selling point definitely been burned trying to maintain compatibility across versions in the past.

Slightly less clear to me how much this actually helps standardize? E.g. Kedro-Airflow constructs the dependency graph, but #5266 seems to indicate this should be part of a standalone package. I assume Kedro-Viz benefits most, and very unfamiliar with vscode-kedro to tell the impact/savings.

I am less sure about the naming of inspection, and the pydantic models (and using pydantic for interface). I have not spent too much time to think about it.

Not sure what Nok had in mind, but at least I intuitively preferred kedro.inspect to kedro.inspection--shorter, more aligned with standard library module names. However, most of Kedro top-level modules are nouns, so maybe @rashidakanchwala's snapshot is good--unless this is limiting for future inspection-related capabilities? Also worth comparing with validation.

Biggest concern is Pydantic as a core dependency. Is it absolutely necessary?

If you take Pydantic as a core dependency, then you need to think about implications of supporting Pydantic v1/v2/potentially a v3. Not sure if this proposed set of snapshots is unlocking enough extra to be worth that.
If you do depend on Pydantic, is the inspection package optional? I guess it's fair that Kedro-Viz depends on kedro[inspect]. If it's not optional, then why is the parameter validation piece making it optional.

@ravi-kumar-pilla Given you've specified that snapshots are read-only, and so you can't reconstruct Kedro objects (and therefore don't need to worry about users creating their own snapshots somehow and building Kedro objects from them), I would think Pydantic/snapshot validation isn't that necessary.

0 replies

ravi-kumar-pilla · 2026-03-02T18:42:05Z

ravi-kumar-pilla
Mar 2, 2026
Collaborator Author

Hi @deepyaman ,

Slightly less clear to me how much this actually helps standardize

Yes, v1 will not be straightaway helpful to eliminate everything in the individual plugins. Thin adapters are still needed on the plugin side to tune to their needs. These will give more info on how much it helps plugins - https://github.com/kedro-org/kedro/blob/spike/inspect-api/docs/inspect-docs/kedro-inspect-consumer-summary.md, https://github.com/kedro-org/kedro/blob/spike/inspect-api/docs/inspect-docs/kedro-inspect-consumer-analysis.md

However, most of Kedro top-level modules are nouns, so maybe @rashidakanchwala's snapshot is good--unless this is limiting for future inspection-related capabilities

Yes, @rashidakanchwala and I discussed around this. I echo with the future capabilities we might support with this layer. Validation might be included in the api contract of the inspection layer but the core focus is on inspection (snapshots or read-only for now). I would actually prefer inspection, but I can ask on our slack tsc for naming the group.

Some options to consider -

kedro.inspect
kedro.snapshot
kedro.describe
kedro.profile
kedro.query

Biggest concern is Pydantic as a core dependency.

Pydantic will be optional present within the inspect group like kedro[inspect] or whatever name of the package we come up with.

Given you've specified that snapshots are read-only, and so you can't reconstruct Kedro objects (and therefore don't need to worry about users creating their own snapshots somehow and building Kedro objects from them), I would think Pydantic/snapshot validation isn't that necessary.

You're right that validation isn't the reason. The reason is serialization and schema generation.

Pydantic is being used here as a serialization layer, not a validation layer. The value is:

Free .model_dump() nested models serialize recursively without any boilerplate. With plain dataclass, you'd need to write asdict() wrappers that handle nested objects etc.
JSON schema generation - ProjectSnapshot.model_json_schema() gives you a machine-readable schema. Useful for documentation, IDE tooling, and any future REST/CLI consumer.

If the team pushes back, we can do dataclass + asdict - just more boilerplate to maintain.

I will post this on slack as well for everyone to vote on these 2 topics, so we can take this KEP to implementation.

Thank you.

2 replies

deepyaman Mar 2, 2026
Collaborator

Biggest concern is Pydantic as a core dependency.

Pydantic will be optional present within the inspect group like kedro[inspect] or whatever name of the package we come up with.

Less concerned if it's optional, especially since Pydantic is already required by some of the consumers, as I recall.

ravi-kumar-pilla Mar 2, 2026
Collaborator Author

Thanks for reviewing the proposal. I opened up a slack voting on these 2 topics. I will consolidate the results and update the implementation accordingly.

merelcht · 2026-03-03T16:36:36Z

merelcht
Mar 3, 2026
Maintainer

+1

Re: the Pydantic concern, I suggest we open a separate KEP (or simply a discussion) about whether we use Pydantic or Dataclasses as a standard throughout the framework.

0 replies

noklam · 2026-03-04T13:01:32Z

noklam
Mar 4, 2026
Collaborator

The core purpose of this module needs to be clearly defined, right now it's mixing concerns. Is it a ser/deser layer, or a formal interface to Kedro internals? These are fundamentally different goals and shouldn't be conflated.

B2. Separate kedro-inspect package

Keeps core lean but fragments the API across packages. Every plugin would need an additional dependency. Since the inspection models and builders are tightly coupled to Kedro internals, maintaining them in core is more reliable and gives better guarantees across Kedro releases. Rejected.

If the intent is to provide an interface for external libraries, the ser/deser layer is unnecessary overhead. When the interface is the JSON/OpenAPI spec, a separate package isn't really a problem, since the interface is the spec, the downstream should not require any dependencies.

A concrete example would make this much clearer. The key question is: what is the actual contract these snapshots define? Take ProjectMetadataSnapshot as an example, it's a lot simpler than the one defined in core currently, what does it actually mean?

class ProjectMetadata(NamedTuple):
    """Structure holding project metadata derived from `pyproject.toml`"""

    config_file: Path
    package_name: str
    project_name: str
    project_path: Path
    source_dir: Path
    kedro_init_version: str
    tools: list
    example_pipeline: str

3 replies

ravi-kumar-pilla Mar 4, 2026
Collaborator Author

Hi @noklam ,

Thanks for the comments

The core purpose of this module needs to be clearly defined, right now it's mixing concerns. Is it a ser/deser layer, or a formal interface to Kedro internals? These are fundamentally different goals and shouldn't be conflated.

Agree. The core purpose of this layer for now is introducing a formal interface to Kedro internals. It is not a ser/deser layer (but it would be great to get this closer to ser/deser). The inspection layer originally stemmed from building a REST API layer in Kedro. However, the intent was always that the API is just one use case...inspection itself should remain independent of it. This KEP is about defining a project snapshot for Kedro, which can then be exposed either directly via Python or through a REST API

If the intent is to provide an interface for external libraries, the ser/deser layer is unnecessary overhead. When the interface is the JSON/OpenAPI spec, a separate package isn't really a problem, since the interface is the spec, the downstream should not require any dependencies.

I am not sure if I understood this completely but for v1 we wanted to target programmatic usage. Having just a spec might be good for http consumers.

A concrete example would make this much clearer.

Exact fields that a model snapshot holds is up for review when the PR comes out. I would like to have our v1 support the same fields that the core classes have but after few discussions, we thought of keeping it to the most required fields.

Please let me know what you think.

noklam Mar 4, 2026
Collaborator

Breakage when Kedro internals change (plugins tied to private APIs)
I think this goal is the clearest one to me, though I have question what's the role between this module and the abstract class that we have currently. Take datasets as example - would this mean we move toward using this module instead of the original interface?

No single authoritative answer to "what does this project look like?"
With this snapshot, would I be able to reconstruct the project? This may be an implementation problem, but in Kedro there are many things that is not serialisable. i.e. a resolver, dynamic defined object, how are these gonna work?

Do you have an example/idea how these Snapshot would be consumed by downstream? I think this may helps.

ravi-kumar-pilla Mar 4, 2026
Collaborator Author

I have question what's the role between this module and the abstract class that we have currently

This module will give a project snapshot i.e., everything combined. We have catalog, pipelines, nodes etc individually and related APIs. This is an effort to combine everything and make a formal interface for users, plugin developers, tools etc.

With this snapshot, would I be able to reconstruct the project

Good question. This will be an ideal place to be (i.e., ser/desr layer). For v1 though we want to keep this module simple but add incremental changes to the layer that can reach a position where you can construct a kedro project out of this.

This may be an implementation problem, but in Kedro there are many things that is not serialisable. i.e. a resolver, dynamic defined object, how are these gonna work

This was raised by @ElenaKhaustova as well. I haven't thought about all the scenarios on how we can deal with a non serializable object since the idea was to have a v1 which is simple, have a formal interface and can serialize most of the fields (considering the usage done via kedro-viz, kedro-airflow, vscode-kedro).

Do you have an example/idea how these Snapshot would be consumed by downstream? I think this may helps.

I do not have a working POC but I can definitely get that done. But the idea is use the snapshots from kedro, have a thin adapater layer in the plugins to customize the snapshot according to the plugin needs. On kedro-viz we use lot of pydantic classes and also have live kedro objects in the backend. Most of this is not required if we have a layer like this. You can find some consumer analysis summary here - https://github.com/kedro-org/kedro/blob/spike/inspect-api/docs/inspect-docs/kedro-inspect-consumer-summary.md

ravi-kumar-pilla · 2026-03-06T00:25:52Z

ravi-kumar-pilla
Mar 6, 2026
Collaborator Author

Since there are no downvotes, the idea of having the Inspection API will be implemented as a follow up. Regarding the inspection models, we will go with dataclasses as opposed to pydantic, to avoid pydantic as a kedro core dependency. Thank you everyone for your time.

0 replies

[KEP 5] Kedro Inspection API #5405

Uh oh!

ravi-kumar-pilla Feb 26, 2026 Collaborator

Context

What are we trying to do?

What this is NOT

Proposed Approach

Prerequisites

No KedroSession, no KedroContext

Pydantic Models

Public API Surface

Potential Drawbacks

Timeline

Appendix A: Module Structure

Appendix B: Rejected Designs

Are you fine with the feature implementation and overall architecture ?

Replies: 8 comments · 5 replies

Uh oh!

datajoelypx Feb 26, 2026

Uh oh!

noklam Feb 26, 2026 Collaborator

Uh oh!

rashidakanchwala Mar 2, 2026 Maintainer

Uh oh!

deepyaman Mar 2, 2026 Collaborator

Uh oh!

ravi-kumar-pilla Mar 2, 2026 Collaborator Author

Uh oh!

deepyaman Mar 2, 2026 Collaborator

Uh oh!

ravi-kumar-pilla Mar 2, 2026 Collaborator Author

Uh oh!

merelcht Mar 3, 2026 Maintainer

Uh oh!

Uh oh!

noklam Mar 4, 2026 Collaborator

Uh oh!

Uh oh!

ravi-kumar-pilla Mar 4, 2026 Collaborator Author

Uh oh!

noklam Mar 4, 2026 Collaborator

Uh oh!

Uh oh!

ravi-kumar-pilla Mar 4, 2026 Collaborator Author

Uh oh!

ravi-kumar-pilla Mar 6, 2026 Collaborator Author

ravi-kumar-pilla
Feb 26, 2026
Collaborator

Replies: 8 comments 5 replies

datajoelypx
Feb 26, 2026

noklam
Feb 26, 2026
Collaborator

rashidakanchwala
Mar 2, 2026
Maintainer

deepyaman
Mar 2, 2026
Collaborator

ravi-kumar-pilla
Mar 2, 2026
Collaborator Author

deepyaman Mar 2, 2026
Collaborator

ravi-kumar-pilla Mar 2, 2026
Collaborator Author

merelcht
Mar 3, 2026
Maintainer

noklam
Mar 4, 2026
Collaborator

ravi-kumar-pilla Mar 4, 2026
Collaborator Author

noklam Mar 4, 2026
Collaborator

ravi-kumar-pilla Mar 4, 2026
Collaborator Author

ravi-kumar-pilla
Mar 6, 2026
Collaborator Author