I have a dataframe column which looks like this:
df_cost['region.localCurrency']:
0 [{'content': 'Dirham', 'languageCode': 'EN'}]
1 [{'content': 'Dirham', 'languageCode': 'EN'}]
2 [{'content': 'Dirham', 'languageCode': 'EN'}]
3 [{'content': 'Euro', 'languageCode': 'DE'}]
4 [{'content': 'Euro', 'languageCode': 'DE'}]
5 [{'content': 'Euro', 'languageCode': 'DE'}]
6 [{'content': 'Euro', 'languageCode': 'DE'}]
7 [{'content': 'Euro', 'languageCode': 'DE'}]
8 [{'content': 'Euro', 'languageCode': 'DE'}]
9 [{'content': 'Euro', 'languageCode': 'DE'}]
10 [{'content': 'Euro', 'languageCode': 'DE'}]
11 [{'content': 'Euro', 'languageCode': 'DE'}]
12 [{'content': 'Euro', 'languageCode': 'DE'}]
13 [{'content': 'Dirham', 'languageCode': 'EN'}]
14 [{'content': 'Dirham', 'languageCode': 'EN'}]
15 [{'content': 'Dirham', 'languageCode': 'EN'}]
16 [{'content': 'Euro', 'languageCode': 'DE'}]
17 [{'content': 'Euro', 'languageCode': 'DE'}]
18 [{'content': 'Euro', 'languageCode': 'DE'}]
19 [{'content': 'Euro', 'languageCode': 'DE'}]
Name: region.localCurrency, dtype: object
and I want to convert it, to separate the dictionary keys and values into columns. I want to add two separate columns to the initial df_cost dataframe, like 'localCurrencyContent' and 'localCurrencyCode', based on the dictionary contents of region.localCurrency. I tried to split the region.localCurrency column like:
df_split=pd.DataFrame(df_cost['region.localCurrency'].apply(pd.Series), columns=['localCurrencyContent', 'localCurrencyCode'])
print(df_split)
but this gives me NaN values for the localCurrencyContent and localCurrencyCode, instead of 'Euro' and 'DE' for example. How could I split the column "region.localCurrency" and add the two created columns to the cost_df, initial dataframe?
CodePudding user response:
Pandas.json_normalize will probably do the job for you. https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.json_normalize.html
CodePudding user response:
Use json_normalize
with convert first values by indexing:
d = {'content':'localCurrencyContent','languageCode':'localCurrencyCode'}
df1 = pd.json_normalize(df_cost.pop('region.localCurrency').str[0]).rename(columns=d)
df = df_cost.join(df1)