create new column and add values to cells of dataframe, from string dictionary like column-CodePudding

I have a dataframe column that contain dictionary like strings. it comes in two ways. first option is dictionary within string such as '{"d":11,"g":0.8,"r":45}'. second option is like that: '{d:18, g:0.1, r:75, f:6}'. The data frame is several millions rows. I do not know in which row appear first or second option.

df_initial =       a     b    c                       kind
               0  0.50  bibi   23    '{"d":11,"g":0.8,"r":45}'
               1  0.80  cici  140     '{d:18, g:0.1, r:75, f:6}'
               2  0.01  didi  320  '{"d":101,"g":0.05,"r":32}'
               3  0.12  mimi    3         '{d:41,g:0.26,r:64}'

the desired dataframe

df_final =       a     b    c                                  kind      d     g   r    f
             0  0.50  bibi   23            '{'d':11, 'g':0.8, 'r':45}'   11  0.80  45  NaN
             1  0.80  cici  140             '{d:18, g:0.1, r:75, f:6}'   18  0.10  75  6.0
             2  0.01  didi  320          '{'d':101, 'g':0.05, 'r':32}'  101  0.05  32  NaN
             3  0.12  mimi    3                 '{d:41, g:0.26, r:64}'   41  0.26  64  NaN

CodePudding user response：

Because in sample data missing '' in keys like {d:18, g:0.1, r:75, f:6} for me json.loads and ast.literal_eval failed, so used:

L=[dict(y.split(':') for y in x.strip("'{} ").replace('"','').replace(', ',',').split(',')) 
   for x in df['kind']]


df = df.join(pd.DataFrame(L, index=df.index))

print (df)
      a     b    c                         kind    d     g   r    f
0  0.50  bibi   23    '{"d":11,"g":0.8,"r":45}'   11   0.8  45  NaN
1  0.80  cici  140   '{d:18, g:0.1, r:75, f:6}'   18   0.1  75    6
2  0.01  didi  320  '{"d":101,"g":0.05,"r":32}'  101  0.05  32  NaN
3  0.12  mimi    3         '{d:41,g:0.26,r:64}'   41  0.26  64  NaN

CodePudding user response：

You could pandas.json_normalize to convert the kind (has been json parsed) column to a dataframe, keep in mind that you might have many different keys in the dictionary.

import json
df = pd.DataFrame([('a', '{"d":11,"g":0.8,"r":45}'),], columns=['a', 'kind'])

In [6]: pd.json_normalize(df['kind'].apply(json.loads))
Out[6]:
    d    g   r
0  11  0.8  45

you can concat this new dataframe to the org along axis=columns/0 to get what you want

In [11]: pd.concat([df, pd.json_normalize(df['kind'].apply(json.loads))], axis='columns')
Out[11]:
   a                     kind   d    g   r
0  a  {"d":11,"g":0.8,"r":45}  11  0.8  45