Python: How to convert column with dictionary into columns-CodePudding

I have a DataFrame with a column that contains a dictionary as follows:

df:
    date                                     dictionary
0   2021-01-01 00:00:00   00:00              'Total':{'USD':100, 'size':20}, 'country':{'USA': {'income': 20000}, 'fees': {'total': 55}}
1   2021-01-01 00:00:00   00:00              'Total':{'EUR':200, 'size':40}, 'country':{'France': {'income': 10000}, 'fees': {'total': 30}}
1   2021-01-02 00:00:00   00:00              'Total':{'GBP':100, 'size':30}, 'country':{'UK': {'income': 23000}, 'fees': {'total': 24}}

What I want is to set USA as a column name and take the value of total from the fees and set that as the value, to get the following:

df_final:
        date                             USA      France     UK
0       2021-01-01 00:00:00   00:00       55          30    NaN
1       2021-01-02 00:00:00   00:00      NaN         NaN     24

My DataFrame has hundreds of columns. I have tried the following:

df_list = []
for idx, row in df.iterrows():
    for dct in row['dictionary']:
        dct['date'] = row['date']
        df_list.append(dct)

But I get the following error: TypeError: 'str' object does not support item assignment. This happened specifically at dct['date'].

How can this be done?

EDIT: I added a few more rows to my DataFrame to better represent my problem.

CodePudding user response：

A possible solution:

df.assign(USA=pd.json_normalize(
    df['dict'], sep='_').loc[:, 'country_fees_total']).drop('dict', axis=1)

Output:

                          date  USA
0  2021-01-01 00:00:00   00:00   55

CodePudding user response：

The first possibility I see, is if your dataframe contains valid json strings like so:

df = pd.DataFrame({
    'date': [
        '2021-01-01 00:00:00', 
        '2021-01-01 00:00:00',
        '2021-01-02 00:00:00', 
    ],
    'dictionary': [ 
        '{"Total":{"USD":100, "size":20}, "country":{"USA": {"income": 20000}, "fees": {"total": 55}}}',
        '{"Total":{"EUR":200, "size":40}, "country":{"France": {"income": 10000}, "fees": {"total": 30}}}',
        '{"Total":{"GBP":100, "size":30}, "country":{"UK": {"income": 23000}, "fees": {"total": 24}}}',
    ]
})

df.date = pd.to_datetime(df.date)
df

Then you could do:

import json

for idx, row in df.iterrows():
    dict = json.loads(row.dictionary)
    dict_keys = list(dict["country"].keys())
    df.loc[idx, dict_keys[0]] = dict["country"]["fees"]["total"]

df_final = df.groupby(df.date.dt.date) \
    .agg('first') \
    .drop(columns=['date', 'dictionary']) \
    .reset_index()
    
df_final

The second is if your df contained valid dictionaries like so:

df = pd.DataFrame({
    'date': [
        '2021-01-01 00:00:00', 
        '2021-01-01 00:00:00',
        '2021-01-02 00:00:00', 
    ],
    'dictionary': [ 
        {"Total":{"USD":100, "size":20}, "country":{"USA": {"income": 20000}, "fees": {"total": 55}}},
        {"Total":{"EUR":200, "size":40}, "country":{"France": {"income": 10000}, "fees": {"total": 30}}},
        {"Total":{"GBP":100, "size":30}, "country":{"UK": {"income": 23000}, "fees": {"total": 24}}},
    ]
})

df.date = pd.to_datetime(df.date)
df

Then you would:

import json

for idx, row in df.iterrows():
    dict = row.dictionary
    dict_keys = list(dict["country"].keys())
    df.loc[idx, dict_keys[0]] = dict["country"]["fees"]["total"]
    # df.loc[index, row]

df_final = df.groupby(df.date.dt.date) \
    .agg('first') \
    .drop(columns=['date', 'dictionary']) \
    .reset_index()

df_final