Home > Blockchain >  How to merge a nested dictionary with a dataframe?
How to merge a nested dictionary with a dataframe?

Time:11-05

I have a dataframe like this:

df

     name     A     B     C    D
 0   name1   11    12    13   14
 1   name2   21    22    23   24

And there is this dictionary:

d = {'name1': {'1': {'0': 9, '1': 0, '2': 0},
               '2': {'0': 14, '1': 8, '2': 10},
               '3': {'0': 0, '1': 5, '2': 0}},
 
     'name2': {'1': {'0': 6, '1': 0, '2': 7},
               '2': {'0': 8, '1': 9, '2': 8},
               '3': {'0': 0, '1': 5, '2': 10}},
 
     'name3': {'1': {'0': 3, '1': 97, '2': 0},
               '2': {'0': 36, '1': 50, '2': 1},
               '3': {'0': 11, '1': 78, '2': 0}}}

I want to get a result like this:

     name     A    B     C    D     1     2     3
 0   name1   11    12    13   14    9     14    0
                                    0     8     5
                                    0     10    0
 1   name2   21    22    23   24    6     8     0
                                    0     9     5
                                    7     8     10

I tried to solve it myself. But my solution is pretty rough. Can you suggest a better solution? Here's what I did:

df_d = pd.DataFrame.from_dict({(i,j): d[i][j]
                         for i in d.keys()
                         for j in d[i].keys()},
                         orient='index')
df_d.index.names = ['name','rating']
df_d= df_d.stack().unstack(level='rating')
df_d.reset_index()
df_result = pd.merge(df, df_d, how='left', on='prod_name')
df_result = df_result.reset_index(drop=True).set_index(['name','A', 'B', 'C', 'D'])

CodePudding user response:

Maybe not cleanest, but with fewer commands:

df = df.merge(
    pd.DataFrame(d).T.reset_index().rename(columns={"index": "name"}), on="name"
).set_index(["name", "A", "B", "C", "D"])
df = df.apply(lambda x: [d.values() for d in x]).explode(df.columns.to_list())

print(df)

Prints:

                   1   2   3
name  A  B  C  D            
name1 11 12 13 14  9  14   0
               14  0   8   5
               14  0  10   0
name2 21 22 23 24  6   8   0
               24  0   9   5
               24  7   8  10

CodePudding user response:

Here is a proposition to get a MultiIndex dataframe :

df_d= pd.concat({k: pd.DataFrame(v) for k, v in d.items()}, axis=0)

out1= (
        df.set_index("name")
            .merge(df_d.droplevel(1, axis=0),
                   left_index=True, right_index=True)
            .reset_index()
            .rename(columns= {'index': 'name'})
            .set_index(['name', 'A', 'B', 'C', 'D'])
      )

print(out1)

                   1   2   3
name  A  B  C  D            
name1 11 12 13 14  9  14   0
               14  0   8   5
               14  0  10   0
name2 21 22 23 24  6   8   0
               24  0   9   5
               24  7   8  10

Or to get a single index dataframe, you can replace pandas.DataFrame.set_index by this :

import numpy as np

out2.loc[out2.duplicated('name'), ['name', 'A', 'B', 'C']] = np.NaN

print(out2)

    name     A     B     C   D  1   2   3
0  name1  11.0  12.0  13.0  14  9  14   0
1    NaN   NaN   NaN   NaN  14  0   8   5
2    NaN   NaN   NaN   NaN  14  0  10   0
3  name2  21.0  22.0  23.0  24  6   8   0
4    NaN   NaN   NaN   NaN  24  0   9   5
5    NaN   NaN   NaN   NaN  24  7   8  10

CodePudding user response:

I think one problem with your approach is that you're setting columns A, B, C and D as index, which makes the resulting table not so "clean" or readable.

Besides you're dropping the keys/index of d values, is that intended? I'd suggest to keep them to perform a merge on a multiindex:

df.set_index("name", inplace=True)
df.index = pd.MultiIndex.from_tuples([(x, "0") for x in df.index], names=["name", "id"])

df2 = pd.DataFrame()
for k, v in d.items():
    df_tmp = pd.DataFrame(v)
    df_tmp.index = pd.MultiIndex.from_product([[k], df_tmp.index], names=["name", "id"])
    df2 = pd.concat([df2, df_tmp])

print(pd.merge(df, df2, on=["name", "id"], how="outer").sort_index())

Output:

             A     B     C     D   1   2   3
name  id                                    
name1 0   11.0  12.0  13.0  14.0   9  14   0
      1    NaN   NaN   NaN   NaN   0   8   5
      2    NaN   NaN   NaN   NaN   0  10   0
name2 0   21.0  22.0  23.0  24.0   6   8   0
      1    NaN   NaN   NaN   NaN   0   9   5
      2    NaN   NaN   NaN   NaN   7   8  10
name3 0    NaN   NaN   NaN   NaN   3  36  11
      1    NaN   NaN   NaN   NaN  97  50  78
      2    NaN   NaN   NaN   NaN   0   1   0

Edit: fixed the key taken as index

  • Related