How to run a kedro pipeline interactively like a function-CodePudding

I would like to run kedro pipelines in jupyter notebook with different inputs, so something like this:

data = catalog.load('my_dataset')
params = catalog.load('params:my_params')
pipelines['my_pipeline'](data=my_dataset, params=my_params)

Is there a way to do this? Also, if I have to feed some inputs to other nodes but the starting one (for example the second node), how would this be done?

CodePudding user response：

Concerning running the pipeline in jupyter, you can use one of the available runners or a custom one. The SequentialRunner is an example and can be used as follow:


from kedro.runner import SequentialRunner

SequentialRunner().run(pipeline = your_pipeline, catalog = your_catalog)

if you are using the kedro jupyter lab or notebook, the catalog is available, otherwise you can create it using DataCatalog(). You can add datasets to your DataCatalog using the add_feed_dict method or the add and save methods (set the flag replace to True if you want to overwrite the dataset in the catalog)

import pandas as pd
from kedro.io.data_catalog import DataCatalog

df = pd.DataFrame({'col_1': [0, 1], 'col_2': [1, 2]})
io = DataCatalog()

io.add_feed_dict({"new_dataset": df }, replace=True)

If you want to start a pipeline from a specific node after having changed some entries in the catalog you can use the from_inputs method that Pipeline objects have.

CodePudding user response：

We actually have a native way to use Kedro in notebook environments, check out the docs here.