I have a dataframe like this:
df
name A B C D
0 name1 11 12 13 14
1 name2 21 22 23 24
And there is this dictionary:
d = {'name1': {'1': {'0': 9, '1': 0, '2': 0},
'2': {'0': 14, '1': 8, '2': 10},
'3': {'0': 0, '1': 5, '2': 0}},
'name2': {'1': {'0': 6, '1': 0, '2': 7},
'2': {'0': 8, '1': 9, '2': 8},
'3': {'0': 0, '1': 5, '2': 10}},
'name3': {'1': {'0': 3, '1': 97, '2': 0},
'2': {'0': 36, '1': 50, '2': 1},
'3': {'0': 11, '1': 78, '2': 0}}}
I want to get a result like this:
name A B C D 1 2 3
0 name1 11 12 13 14 9 14 0
0 8 5
0 10 0
1 name2 21 22 23 24 6 8 0
0 9 5
7 8 10
I tried to solve it myself. But my solution is pretty rough. Can you suggest a better solution? Here's what I did:
df_d = pd.DataFrame.from_dict({(i,j): d[i][j]
for i in d.keys()
for j in d[i].keys()},
orient='index')
df_d.index.names = ['name','rating']
df_d= df_d.stack().unstack(level='rating')
df_d.reset_index()
df_result = pd.merge(df, df_d, how='left', on='prod_name')
df_result = df_result.reset_index(drop=True).set_index(['name','A', 'B', 'C', 'D'])
CodePudding user response:
Maybe not cleanest, but with fewer commands:
df = df.merge(
pd.DataFrame(d).T.reset_index().rename(columns={"index": "name"}), on="name"
).set_index(["name", "A", "B", "C", "D"])
df = df.apply(lambda x: [d.values() for d in x]).explode(df.columns.to_list())
print(df)
Prints:
1 2 3
name A B C D
name1 11 12 13 14 9 14 0
14 0 8 5
14 0 10 0
name2 21 22 23 24 6 8 0
24 0 9 5
24 7 8 10
CodePudding user response:
Here is a proposition to get a MultiIndex dataframe :
df_d= pd.concat({k: pd.DataFrame(v) for k, v in d.items()}, axis=0)
out1= (
df.set_index("name")
.merge(df_d.droplevel(1, axis=0),
left_index=True, right_index=True)
.reset_index()
.rename(columns= {'index': 'name'})
.set_index(['name', 'A', 'B', 'C', 'D'])
)
print(out1)
1 2 3
name A B C D
name1 11 12 13 14 9 14 0
14 0 8 5
14 0 10 0
name2 21 22 23 24 6 8 0
24 0 9 5
24 7 8 10
Or to get a single index dataframe, you can replace pandas.DataFrame.set_index
by this :
import numpy as np
out2.loc[out2.duplicated('name'), ['name', 'A', 'B', 'C']] = np.NaN
print(out2)
name A B C D 1 2 3
0 name1 11.0 12.0 13.0 14 9 14 0
1 NaN NaN NaN NaN 14 0 8 5
2 NaN NaN NaN NaN 14 0 10 0
3 name2 21.0 22.0 23.0 24 6 8 0
4 NaN NaN NaN NaN 24 0 9 5
5 NaN NaN NaN NaN 24 7 8 10
CodePudding user response:
I think one problem with your approach is that you're setting columns A, B, C and D as index, which makes the resulting table not so "clean" or readable.
Besides you're dropping the keys/index of d
values, is that intended? I'd suggest to keep them to perform a merge on a multiindex:
df.set_index("name", inplace=True)
df.index = pd.MultiIndex.from_tuples([(x, "0") for x in df.index], names=["name", "id"])
df2 = pd.DataFrame()
for k, v in d.items():
df_tmp = pd.DataFrame(v)
df_tmp.index = pd.MultiIndex.from_product([[k], df_tmp.index], names=["name", "id"])
df2 = pd.concat([df2, df_tmp])
print(pd.merge(df, df2, on=["name", "id"], how="outer").sort_index())
Output:
A B C D 1 2 3
name id
name1 0 11.0 12.0 13.0 14.0 9 14 0
1 NaN NaN NaN NaN 0 8 5
2 NaN NaN NaN NaN 0 10 0
name2 0 21.0 22.0 23.0 24.0 6 8 0
1 NaN NaN NaN NaN 0 9 5
2 NaN NaN NaN NaN 7 8 10
name3 0 NaN NaN NaN NaN 3 36 11
1 NaN NaN NaN NaN 97 50 78
2 NaN NaN NaN NaN 0 1 0
Edit: fixed the key taken as index