Home > Software engineering >  How to convert dataframe column which contains list of dictionary into separate columns?
How to convert dataframe column which contains list of dictionary into separate columns?

Time:11-23

I have a dataframe column which looks like this:

df_cost['region.localCurrency']:

0     [{'content': 'Dirham', 'languageCode': 'EN'}]
1     [{'content': 'Dirham', 'languageCode': 'EN'}]
2     [{'content': 'Dirham', 'languageCode': 'EN'}]
3       [{'content': 'Euro', 'languageCode': 'DE'}]
4       [{'content': 'Euro', 'languageCode': 'DE'}]
5       [{'content': 'Euro', 'languageCode': 'DE'}]
6       [{'content': 'Euro', 'languageCode': 'DE'}]
7       [{'content': 'Euro', 'languageCode': 'DE'}]
8       [{'content': 'Euro', 'languageCode': 'DE'}]
9       [{'content': 'Euro', 'languageCode': 'DE'}]
10      [{'content': 'Euro', 'languageCode': 'DE'}]
11      [{'content': 'Euro', 'languageCode': 'DE'}]
12      [{'content': 'Euro', 'languageCode': 'DE'}]
13    [{'content': 'Dirham', 'languageCode': 'EN'}]
14    [{'content': 'Dirham', 'languageCode': 'EN'}]
15    [{'content': 'Dirham', 'languageCode': 'EN'}]
16      [{'content': 'Euro', 'languageCode': 'DE'}]
17      [{'content': 'Euro', 'languageCode': 'DE'}]
18      [{'content': 'Euro', 'languageCode': 'DE'}]
19      [{'content': 'Euro', 'languageCode': 'DE'}]
Name: region.localCurrency, dtype: object

and I want to convert it, to separate the dictionary keys and values into columns. I want to add two separate columns to the initial df_cost dataframe, like 'localCurrencyContent' and 'localCurrencyCode', based on the dictionary contents of region.localCurrency. I tried to split the region.localCurrency column like:

df_split=pd.DataFrame(df_cost['region.localCurrency'].apply(pd.Series), columns=['localCurrencyContent', 'localCurrencyCode'])
print(df_split)

but this gives me NaN values for the localCurrencyContent and localCurrencyCode, instead of 'Euro' and 'DE' for example. How could I split the column "region.localCurrency" and add the two created columns to the cost_df, initial dataframe?

CodePudding user response:

Pandas.json_normalize will probably do the job for you. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html

CodePudding user response:

Use json_normalize with convert first values by indexing:

d = {'content':'localCurrencyContent','languageCode':'localCurrencyCode'}
df1 = pd.json_normalize(df_cost.pop('region.localCurrency').str[0]).rename(columns=d)
df = df_cost.join(df1)
  • Related