Home > Enterprise >  create new column and add values to cells of dataframe, from string dictionary like column
create new column and add values to cells of dataframe, from string dictionary like column

Time:05-30

I have a dataframe column that contain dictionary like strings. it comes in two ways. first option is dictionary within string such as '{"d":11,"g":0.8,"r":45}'. second option is like that: '{d:18, g:0.1, r:75, f:6}'. The data frame is several millions rows. I do not know in which row appear first or second option.

df_initial =       a     b    c                       kind
               0  0.50  bibi   23    '{"d":11,"g":0.8,"r":45}'
               1  0.80  cici  140     '{d:18, g:0.1, r:75, f:6}'
               2  0.01  didi  320  '{"d":101,"g":0.05,"r":32}'
               3  0.12  mimi    3         '{d:41,g:0.26,r:64}'

the desired dataframe

df_final =       a     b    c                                  kind      d     g   r    f
             0  0.50  bibi   23            '{'d':11, 'g':0.8, 'r':45}'   11  0.80  45  NaN
             1  0.80  cici  140             '{d:18, g:0.1, r:75, f:6}'   18  0.10  75  6.0
             2  0.01  didi  320          '{'d':101, 'g':0.05, 'r':32}'  101  0.05  32  NaN
             3  0.12  mimi    3                 '{d:41, g:0.26, r:64}'   41  0.26  64  NaN

CodePudding user response:

Because in sample data missing '' in keys like {d:18, g:0.1, r:75, f:6} for me json.loads and ast.literal_eval failed, so used:

L=[dict(y.split(':') for y in x.strip("'{} ").replace('"','').replace(', ',',').split(',')) 
   for x in df['kind']]


df = df.join(pd.DataFrame(L, index=df.index))

print (df)
      a     b    c                         kind    d     g   r    f
0  0.50  bibi   23    '{"d":11,"g":0.8,"r":45}'   11   0.8  45  NaN
1  0.80  cici  140   '{d:18, g:0.1, r:75, f:6}'   18   0.1  75    6
2  0.01  didi  320  '{"d":101,"g":0.05,"r":32}'  101  0.05  32  NaN
3  0.12  mimi    3         '{d:41,g:0.26,r:64}'   41  0.26  64  NaN

CodePudding user response:

You could pandas.json_normalize to convert the kind (has been json parsed) column to a dataframe, keep in mind that you might have many different keys in the dictionary.

import json
df = pd.DataFrame([('a', '{"d":11,"g":0.8,"r":45}'),], columns=['a', 'kind'])

In [6]: pd.json_normalize(df['kind'].apply(json.loads))
Out[6]:
    d    g   r
0  11  0.8  45

you can concat this new dataframe to the org along axis=columns/0 to get what you want

In [11]: pd.concat([df, pd.json_normalize(df['kind'].apply(json.loads))], axis='columns')
Out[11]:
   a                     kind   d    g   r
0  a  {"d":11,"g":0.8,"r":45}  11  0.8  45
  • Related