Dependency Injection alternatives #5258
Replies: 7 comments
-
|
I imagine that the unique way to do this, is using an AbstractDataset, right? from kedro.io import AbstractDataset class RedisDataset(AbstractDataset):
def __init__(self, host: str, port: int, db: int, name: str):
self.host = host
self.port = port
self.db = db
self.name = name
def load(self):
return Redis(host=self.host, port=self.port, db=self.db)
def save(self, data):
return Redis(host=self.host, port=self.port, db=self.db).set(data)
def _describe(self):
return dict(
host=self.host,
port=self.port,
db=self.db,
name=self.name,
) |
Beta Was this translation helpful? Give feedback.
-
|
Yes I think it is the best way to do right now, with some annoying limits - reinstantiating the connection on each node being one of them. There is a detailed 4 years old discussion here : #904 that is still very valid and worth reading. The conclusion is that we (as a the kedro team) do not want let users perform I/O operations at the node level to enforce reproducibility. I am personally not fully aligned with this conclusions and I think we should support some of these use cases which I made clear in the issue, but the conclusion still stands for now. |
Beta Was this translation helpful? Give feedback.
-
|
The discussion looks very related. At some point it becomes necessary to do things quickly, but I understand that one of Kedro’s purposes is also lineage, and moving logic into nodes may cause that feature to be lost. I don’t fully understand the entire Kedro codebase, but perhaps having a kind of “Container” dataset (similar to another context, but meant to share information between nodes) could be a solution. I imagine an alternative using a container from something like the Python Dependency Injector, to avoid creating objects before they’re needed (I think the term for this case is “transient object”). Another alternative would be to use classes instead of functions, and inject dependencies in the init method. BTW, I don’t want to reopen a closed discussion, I understand the points and the purpose of Kedro, and it seems the project is going in another direction |
Beta Was this translation helpful? Give feedback.
-
|
I think this would be something akin to Dagster's I haven't thought in any depth about this, but I'm generally supportive of (at the very least, exploring) the idea. If somebody is motivated to support this, and wants to put together a proposal, I think we have a path forward now via the KEP process. I'd also be curious about thoughts from people who have familiarity with Dagster (or any other product with a similar concept), to see if they feel this is something missing in Kedro. Cc @gtauzin @datajoely FYI @DimedS @ankatiyar in case you see something relevant as you look at Dagster |
Beta Was this translation helpful? Give feedback.
-
Don't worry, the impacts of such choices are still (not to say more and more) debatable :
|
Beta Was this translation helpful? Give feedback.
-
Apart from the MLFlow case, I felt an |
Beta Was this translation helpful? Give feedback.
-
|
I'm going to move this to a Discussion, because this is more like that than an actual ticket to be worked on for now. Let's keep the conversation going there! |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
Description
Reading the documentation at a glance, I am not able to find how to solve the problem of "DI" , with different strategies or at least how to do it (for example, good for plugins).
It will be nice to explain that, perhaps its already explained and I didn't find it.
Context
For example, if I want to use dask directly in my node (without the runner): How should I do it?
Same for MLFlow, or boto3, redis, or whatever... how to receive a configured object in my nodes? through context? using datasets?
Beta Was this translation helpful? Give feedback.
All reactions