Python: Convert JSON from df column into individual df columns-CodePudding

I have an excel file that looks something like this:

Header1	Header2	Header3
data	data	[{"key1":"123","key2":"Value1"},{"key1":"123","key2":"Value2"}, {"key1":"123","key2":"Value3"}]
data	data	[{"key1":"123","key2":"Value1"},{"key1":"123","key2":"Value2"}, {"key1":"123","key2":"Value3"}]

Header3 contains JSON strings that look like this

[
{"key1":"123","key2":"Value1"},
{"key1":"123","key2":"Value2"}, 
{"key1":"123","key2":"Value3"}
]

I would like to parse the JSON Header3 column and for each key create a column with the name of the key appended with the value of key2, the keys are always the same throughout the file.

The end data frame should look something like this:

Header1	Header2	Key1.Value1	Key2.Value1	Key1.Value2	Key2.Value2
data	data	123	Value1	123	Value2
data	data	123	Value1	123	Value2

Actual example:

Company	JSON
Amazon	[{"charge1":"500", "charge2":"200","card":"Visa"},{"charge1":"234", "charge2":"654","card":"Amex"}
Apple	[{"charge1":"689", "charge2":"433","card":"Visa"},{"charge1":"25434", "charge2":"6554644","card":"Amex"}]

Needs to become:

Company	charge1.Visa	charge2.Visa	card.Visa	charge1.AMEX	charge2.AMEX	card.AMEX
Amazon	500	200	Visa	234	654	Amex
Apple	689	433	Visa	25434	6554644	Amex

Before getting into fancy stuff I tried to at least normalize the data, but I'm returned with an empty series.

df = pd.read_excel('test.xlsx')
pd.json_normalize(df.JSON)

Output:
0
1
2
3
4
5
...
188 rows x 0 columns

CodePudding user response：

You can't normalize it because it's loaded as string from Excel. Try this:

import json

s = df["JSON"].apply(json.loads).explode()
tmp = (
    pd.DataFrame(s.to_list(), index=s.index)
    .set_index("card", append=True)
    .unstack()
)
tmp.columns = [".".join(col) for col in tmp.columns]

pd.concat([df[["Company"]], tmp], axis=1)

The card.* columns look kinda weird. If you know what column you are referring to, you already know its value so why include it in the output dataframe?