I would like to run kedro pipelines in jupyter notebook with different inputs, so something like this:
data = catalog.load('my_dataset')
params = catalog.load('params:my_params')
pipelines['my_pipeline'](data=my_dataset, params=my_params)
Is there a way to do this? Also, if I have to feed some inputs to other nodes but the starting one (for example the second node), how would this be done?
CodePudding user response:
Concerning running the pipeline in jupyter, you can use one of the available runners or a custom one. The SequentialRunner is an example and can be used as follow:
from kedro.runner import SequentialRunner
SequentialRunner().run(pipeline = your_pipeline, catalog = your_catalog)
if you are using the kedro jupyter lab or notebook, the catalog is available, otherwise you can create it using DataCatalog(). You can add datasets to your DataCatalog using the add_feed_dict method or the add and save methods (set the flag replace to True if you want to overwrite the dataset in the catalog)
import pandas as pd
from kedro.io.data_catalog import DataCatalog
df = pd.DataFrame({'col_1': [0, 1], 'col_2': [1, 2]})
io = DataCatalog()
io.add_feed_dict({"new_dataset": df }, replace=True)
If you want to start a pipeline from a specific node after having changed some entries in the catalog you can use the from_inputs method that Pipeline objects have.
CodePudding user response:
We actually have a native way to use Kedro in notebook environments, check out the docs here.