Home > other >  How to run a kedro pipeline interactively like a function
How to run a kedro pipeline interactively like a function

Time:10-04

I would like to run kedro pipelines in jupyter notebook with different inputs, so something like this:

data = catalog.load('my_dataset')
params = catalog.load('params:my_params')
pipelines['my_pipeline'](data=my_dataset, params=my_params)

Is there a way to do this? Also, if I have to feed some inputs to other nodes but the starting one (for example the second node), how would this be done?

CodePudding user response:

Concerning running the pipeline in jupyter, you can use one of the available runners or a custom one. The SequentialRunner is an example and can be used as follow:


from kedro.runner import SequentialRunner

SequentialRunner().run(pipeline = your_pipeline, catalog = your_catalog)

if you are using the kedro jupyter lab or notebook, the catalog is available, otherwise you can create it using DataCatalog(). You can add datasets to your DataCatalog using the add_feed_dict method or the add and save methods (set the flag replace to True if you want to overwrite the dataset in the catalog)

import pandas as pd
from kedro.io.data_catalog import DataCatalog

df = pd.DataFrame({'col_1': [0, 1], 'col_2': [1, 2]})
io = DataCatalog()

io.add_feed_dict({"new_dataset": df }, replace=True)

If you want to start a pipeline from a specific node after having changed some entries in the catalog you can use the from_inputs method that Pipeline objects have.

CodePudding user response:

We actually have a native way to use Kedro in notebook environments, check out the docs here.

  • Related