How to create a dictionary with value list from dataframe-CodePudding

I have a dataframe, like so:

Projects	Goals	Steps
Project A	Goal 1	NaN
Project A	NaN	Step 1
Project A	NaN	Step 2
Project A	Goal 2	NaN
Project A	NaN	Step 3
Project A	NaN	Step 4

It's not sorted by projects, I just wanted to make it easier to understand with one example.

I'm trying to create a dictionary out of it, in the following format: dict = {Project A: {Goal 1 : [Step 1, Step 2], Goal 2 : [Step 3, Step 4]}, Project B...}

Any ideas how to solve this?

CodePudding user response：

You can build your dictionary by grouping successively by 'Projects' and 'Goals' aggregating the steps with list:

df["Goals"] = df['Goals'].ffill()
df = df.dropna()

dict_out = {}

for proj, sub_df in df.groupby('Projects'):
    sub_df = sub_df.drop('Projects', axis=1).groupby(['Goals']).agg(list)
    dict_out.update({proj: sub_df.to_dict()['Steps']})

print(dict_out)

Output:

{'Project A': {'Goal 1': ['Step 1', 'Step 2'], 'Goal 2': ['Step 3', 'Step 4']}}

Edit: forgot ffill and dropna in my copy/paste

CodePudding user response：

Step-1: Dealing with nans Step-2: Groupby project then groupby Goals and create dictionary

This could help you,

df['Goals'] = df['Goals'].fillna(method='ffill')
df = df.dropna()
{k: f.groupby('Goals', dropna=True)['Steps'].apply(list).to_dict()
     for k, f in df.groupby('Projects')}

Alternative Solution:

df.groupby(['Goals']).agg({'Steps': list, 'Projects': 'first'}).groupby(['Projects']).agg(dict).reset_index().to_dict(orient='records')

O/P:

{'Project A': {'Goal 1': ['Step 1', 'Step 2'], 'Goal 2': ['Step 3', 'Step 4']}}