[KEP-1] Parameter Validation #5234
Replies: 4 comments 2 replies
-
|
+1 Sounds like a feature many people could benefit from, and nobody is forced to adopt. I was poking around in some colleague's code yesterday, and their custom dataset arguments are also validated using Pydantic models. I suppose this isn't necessary to cover, though, because people fully own custom dataset code, and it doesn't change anything for the dataset user. Perhaps worth just thinking through. It would be helpful to make this proposal self-contained—I think a lot of the key information (e.g. usage examples in nodes, the fact that it creates Pydantic objects, etc.) needs to be chased down through links and isn't easily parsed by somebody just reading this proposal. Furthermore https://github.com/kedro-org/kedro/blob/test/validate_no_hook/kedro/framework/validation/README.md points to a branch that can disappear or be changed, and in general I didn't really get a clear idea of what it's trying to say on a pass through. #5110 and #5161 seemed more helpful to me, but I also had the benefit of attending the technical design session on this topic. |
Beta Was this translation helpful? Give feedback.
-
|
Hi @deepyaman , I tried to update with some more information to make this self-contained. The reason I attached links is to provide TSC with all the information around this discussion and at the same time not to overwhelm them with the content. Let me know if the updated information adds some value and makes this more clear. If not, I am happy to add further information in the discussion. Thank you |
Beta Was this translation helpful? Give feedback.
-
|
Pydantic has @validate_call decorator that does it out of the box. I guess the only difference is that the validation is happening during node run, rather than before pipeline run. But the workflow is very simple, you just wrap a function that then becomes a node with that decorator, and it automatically parses the dict that Kedro supplies to this function as an instance of the BaseModel. |
Beta Was this translation helpful? Give feedback.
-
|
Hi Team, Thank you for voting and participating on the discussion. This proposal will be implemented in subtasks and can be tracked here - #5110 The discussion is closed now. Thank you |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Context: Add parameter validation to Kedro (#5110)
Spike: #5161
Authors: @ravi-kumar-pilla
KEP shepherd: @ravi-kumar-pilla
What are we trying to do
Introduce native parameter validation in Kedro. The goal is to enable early (fail-fast) validation of parameters and provide type-safe integration with future features such as JSON Schema export, UI auto-generation in Kedro-Viz, and MCP/LLM interfaces.
Example (Pydantic)
What this is NOT
How is it done today, and what are the limits of current practice
We do not have parameter validation or any data validations on Kedro
What is new in your approach and why do you think it will be successful
A Validation Framework within Kedro
Overview
The Kedro Validation Framework validates and instantiates typed inputs for pipeline nodes. This framework automatically discovers type requirements from pipeline node signatures using configurable source filters and validates them before they are used, ensuring type safety and early error detection.
Architecture
The validation framework consists of five main components:
(Extensible to any source type through pluggable source filters but focussing on ParameterValidtor in the POC)
Core Components
validators/parameter_validator.py) - Main orchestrator that coordinates parameter validationtype_extractor.py) - Discovers type requirements from pipelines using configurable source filtermodel_factory.py) - Creates instances of Pydantic classes and dataclassessource_filters.py) - Pluggable filters for different source types (parameters, datasets)exceptions.py) - Specialized error handlingIntegration Points
Context-Level Integration
The validation framework integrates directly with
KedroContext.paramsproperty:The validation occurs when
context.paramsis accessed, providing transparent validation without modifying existing code.Parameter Validation Flow
Key Features & Benefits
Strengths
Drawbacks
Usage
# Run your kedro project with validation kedro runReference
https://github.com/kedro-org/kedro/blob/test/validate_no_hook/kedro/framework/validation/README.md
Who cares? If you are successful, what difference will it make?
What are the risks?
Benchmark Results:
More information on benchmark results here
Additional maintenance as we modify Kedro core.
How long will it take?
~ 2 weeks
Appendix A: Proposed API Changes
Context-Level Integration
The validation framework integrates directly with KedroContext.params property:
More details here - https://github.com/kedro-org/kedro/blob/test/validate_no_hook/kedro/framework/validation/README.md
Appendix B. Optional Design Sketch:
graph TD %% ============================= %% SECTION 1: Validation Flow %% ============================= subgraph VF["Validation Flow"] F["context.params Access"] --> G{"Cache Hit?"} G -->|Yes| H["Return Cached Objects"] G -->|No| I["Merge Config + Runtime Params"] I --> A["ParameterValidator"] A --> B["TypeExtractor"] B --> J["Extracted Type Definitions"] J --> C["ModelFactory"] C --> K["Create Typed Instances"] K --> L["Cache & Return"] end %% ============================= %% SECTION 2: Dependencies %% ============================= subgraph DEP["Supporting Components"] B --> D["SourceFilters"] B --> E["Custom Exceptions"] C --> E D --> M["Parameters"] D --> N["Datasets"] end %% ============================= %% STYLES %% ============================= style VF fill:#fafafa,stroke:#ccc,stroke-width:1px style DEP fill:#fafafa,stroke:#ccc,stroke-width:1px style A fill:#e3f2fd,stroke:#90caf9,stroke-width:1px style B fill:#f3e5f5,stroke:#ce93d8,stroke-width:1px style C fill:#fff9c4,stroke:#fbc02d,stroke-width:1px style D fill:#fce4ec,stroke:#f48fb1,stroke-width:1px style E fill:#ffe0b2,stroke:#ffb74d,stroke-width:1px style F fill:#e8f5e9,stroke:#81c784,stroke-width:1px style G fill:#fff3e0,stroke:#ffb74d,stroke-width:1px style H fill:#c8e6c9,stroke:#66bb6a,stroke-width:1px style I fill:#fffde7,stroke:#fdd835,stroke-width:1px style J fill:#ede7f6,stroke:#9575cd,stroke-width:1px style K fill:#dcedc8,stroke:#8bc34a,stroke-width:1px style L fill:#66bb6a,color:#fff,stroke-width:1px style M fill:#f1f8e9,stroke:#aed581,stroke-width:1px style N fill:#f1f8e9,stroke:#aed581,stroke-width:1pxAppendix C. Optional Rejected Designs
Plugin approach: A standalone kedro-validator plugin using hooks for validation and instantiation. Lightweight and safe for experimentation but requires users to install it manually.
Session-level integration: Built-in hooks within Kedro core, optionally triggered via a
--validateflag. Avoids separate installation but still hook-based.Monkey-patching parameters: Directly modifies
context.paramsafter validation for performance gainsRelated Discussions
Beta Was this translation helpful? Give feedback.
All reactions