I have a dataframe, like so:
Projects | Goals | Steps |
---|---|---|
Project A | Goal 1 | NaN |
Project A | NaN | Step 1 |
Project A | NaN | Step 2 |
Project A | Goal 2 | NaN |
Project A | NaN | Step 3 |
Project A | NaN | Step 4 |
It's not sorted by projects, I just wanted to make it easier to understand with one example.
I'm trying to create a dictionary out of it, in the following format: dict = {Project A: {Goal 1 : [Step 1, Step 2], Goal 2 : [Step 3, Step 4]}, Project B...}
Any ideas how to solve this?
CodePudding user response:
You can build your dictionary by grouping successively by 'Projects' and 'Goals' aggregating the steps with list
:
df["Goals"] = df['Goals'].ffill()
df = df.dropna()
dict_out = {}
for proj, sub_df in df.groupby('Projects'):
sub_df = sub_df.drop('Projects', axis=1).groupby(['Goals']).agg(list)
dict_out.update({proj: sub_df.to_dict()['Steps']})
print(dict_out)
Output:
{'Project A': {'Goal 1': ['Step 1', 'Step 2'], 'Goal 2': ['Step 3', 'Step 4']}}
Edit: forgot ffill and dropna in my copy/paste
CodePudding user response:
Step-1: Dealing with nans Step-2: Groupby project then groupby Goals and create dictionary
This could help you,
df['Goals'] = df['Goals'].fillna(method='ffill')
df = df.dropna()
{k: f.groupby('Goals', dropna=True)['Steps'].apply(list).to_dict()
for k, f in df.groupby('Projects')}
Alternative Solution:
df.groupby(['Goals']).agg({'Steps': list, 'Projects': 'first'}).groupby(['Projects']).agg(dict).reset_index().to_dict(orient='records')
O/P:
{'Project A': {'Goal 1': ['Step 1', 'Step 2'], 'Goal 2': ['Step 3', 'Step 4']}}