Home > Software design >  Exploding multiple dict columns and concatenating with original Pandas data frame
Exploding multiple dict columns and concatenating with original Pandas data frame

Time:05-14

I export a Postgres SQL query to create a Pandas data frame df similar to the following:

df = pd.DataFrame({
    'employee_id' : [123, 456, 789],
    'country_code' : ['US', 'CAN', 'MEX'],
    'sales' : [{'foo': 2, 'bar': 0, 'baz': 1},
               {'foo': 3, 'bar': 1, 'baz': 2},
               {'foo': 7, 'bar': 0, 'baz': 4}],
    'expenses' : [{'red': 1, 'white': 0, 'blue': 3},
               {'red': 1, 'white': 0, 'blue': 1},
               {'red': 2, 'white': 2, 'blue': 2}]
})

df
 
    employee_id   country_code      sales                             expenses
0   123           US                {'foo': 2, 'bar': 0, 'baz': 1}    {'red': 1, 'white': 0, 'blue': 3}
1   456           CAN               {'foo': 3, 'bar': 1, 'baz': 2}    {'red': 1, 'white': 0, 'blue': 1}
2   789           MEX               {'foo': 7, 'bar': 0, 'baz': 4}    {'red': 2, 'white': 2, 'blue': 2}

I would like to be able to explode both the sales and expenses columns so that their keys are separate columns. Currently, I'm only able to explode one of these columns, as follows:

df = pd.json_normalize(df['sales'])
df
    foo bar baz
0   2   0   1
1   3   1   2
2   7   0   4

I'm not able to pass a list of columns to pd.json.normalize().

Questions:

  1. How do I explode both the sales and expenses columns?
  2. After exploding both columns, how do I add back the other two columns (employee_id and country_code) from the original data frame?

The desired output is:

    employee_id   country_code   foo   bar   baz   red   white   blue
0   123           US             2     0     1     1     0       3
1   456           CAN            3     1     2     1     0       1
2   789           MEX            7     0     4     2     2       2

Thank you!

CodePudding user response:

You may use concat along axis=1 with json_normalize:

json_cols = ['sales','expenses']
result = pd.concat([pd.json_normalize(df[col]) for col in json_cols],axis=1)
result = pd.concat([df.drop(json_cols,axis=1),result],axis=1)

Output:

result

    employee_id country_code    foo bar baz red white   blue
0   123         US               2   0   1   1   0       3 
1   456         CAN              3   1   2   1   0       1
2   789         MEX              7   0   4   2   2       2

CodePudding user response:

If you want to modify the original dataframe, a variant would be:

cols = ['sales', 'expenses']

df = pd.concat([df] [pd.json_normalize(df.pop(c)) for c in cols], axis=1)

output:

   employee_id country_code  foo  bar  baz  red  white  blue
0          123           US    2    0    1    1      0     3
1          456          CAN    3    1    2    1      0     1
2          789          MEX    7    0    4    2      2     2
  • Related