I have a dataframe that has the attributes "key"
, "name"
, and "value
". For each "key"
group, I want the attributes "name"
to be stored in an array. The "value"
of each name would then be stored into it's own column named "<name>_value"
. For example:
key name value
0 A Patton 2
1 A Arthur 2
2 B Will 1
3 B Patton 1
The dataframe above should be transformed as such:
key name Patton_value Arthur_value Will_value
0 A [Patton, Arthur] 2 2 null
1 B [Patton, Will] 1 null 1
What I have attempted so far is:
df.groupby('key').name.apply(list).reset_index()
But I don't know how to preserve the values for the "name"
and save them as a column.
CodePudding user response:
There might be more concise ways to do it, but this works:
D = df.pivot(columns='name', index='key', values='value').reset_index().rename_axis(None, axis=1)
D['name'] = D.apply(lambda x: df['name'][df['key'] == x['key']].values, axis=1).reset_index(drop=True)
D = D[[D.columns[0], D.columns[-1], *D.columns[1:-1]]]
D.columns = [*D.columns[:2], *D.columns[2:] '_value']
Output:
>>> D
key name Arthur_value Patton_value Will_value
0 A [Patton, Arthur] 2.0 2.0 NaN
1 B [Will, Patton] NaN 1.0 1.0
CodePudding user response:
You can create a name_list
column and pivot the dataframe:
df['name_list'] = df['key'].map(df.groupby('key')['name'].agg(tuple))
out = df.pivot_table(values='value', index=['key','name_list'], columns='name').add_suffix('_value').reset_index().rename({'name_list':'name'}, axis=1)
out['name'] = out['name'].apply(lambda x: list(x))
out.columns.name = None
Output:
key name Arthur_value Patton_value Will_value
0 A [Patton, Arthur] 2.0 2.0 NaN
1 B [Will, Patton] NaN 1.0 1.0