I export a Postgres SQL query to create a Pandas data frame df
similar to the following:
df = pd.DataFrame({
'employee_id' : [123, 456, 789],
'country_code' : ['US', 'CAN', 'MEX'],
'sales' : [{'foo': 2, 'bar': 0, 'baz': 1},
{'foo': 3, 'bar': 1, 'baz': 2},
{'foo': 7, 'bar': 0, 'baz': 4}],
'expenses' : [{'red': 1, 'white': 0, 'blue': 3},
{'red': 1, 'white': 0, 'blue': 1},
{'red': 2, 'white': 2, 'blue': 2}]
})
df
employee_id country_code sales expenses
0 123 US {'foo': 2, 'bar': 0, 'baz': 1} {'red': 1, 'white': 0, 'blue': 3}
1 456 CAN {'foo': 3, 'bar': 1, 'baz': 2} {'red': 1, 'white': 0, 'blue': 1}
2 789 MEX {'foo': 7, 'bar': 0, 'baz': 4} {'red': 2, 'white': 2, 'blue': 2}
I would like to be able to explode both the sales
and expenses
columns so that their keys are separate columns. Currently, I'm only able to explode one of these columns, as follows:
df = pd.json_normalize(df['sales'])
df
foo bar baz
0 2 0 1
1 3 1 2
2 7 0 4
I'm not able to pass a list of columns to pd.json.normalize()
.
Questions:
- How do I explode both the
sales
andexpenses
columns? - After exploding both columns, how do I add back the other two columns (
employee_id
andcountry_code
) from the original data frame?
The desired output is:
employee_id country_code foo bar baz red white blue
0 123 US 2 0 1 1 0 3
1 456 CAN 3 1 2 1 0 1
2 789 MEX 7 0 4 2 2 2
Thank you!
CodePudding user response:
You may use concat
along axis=1 with json_normalize
:
json_cols = ['sales','expenses']
result = pd.concat([pd.json_normalize(df[col]) for col in json_cols],axis=1)
result = pd.concat([df.drop(json_cols,axis=1),result],axis=1)
Output:
result
employee_id country_code foo bar baz red white blue
0 123 US 2 0 1 1 0 3
1 456 CAN 3 1 2 1 0 1
2 789 MEX 7 0 4 2 2 2
CodePudding user response:
If you want to modify the original dataframe, a variant would be:
cols = ['sales', 'expenses']
df = pd.concat([df] [pd.json_normalize(df.pop(c)) for c in cols], axis=1)
output:
employee_id country_code foo bar baz red white blue
0 123 US 2 0 1 1 0 3
1 456 CAN 3 1 2 1 0 1
2 789 MEX 7 0 4 2 2 2