remove string from json row-CodePudding

I'm taking several columns from a data frame and adding them to a new column.

A B C
1 3 6
1 2 4 
4 5 0 

df['D'] = df.apply(lambda x: x[['C', 'B']].to_json(), axis=1)

I'm then creating a new data frame that locates the unique instances of df['A']:

df2 = pd.DataFrame({'A': df.A.unique()})

finally, I'm creating a new column in df2 that list the value of df['B'] and df['C']

df2['E'] = [list(set(df['D'].loc[df['A'] == x['A']]))
                     for _, x in df2.iterrows()]

but this is stringing each object:

A B C D
1 3 6 ['{"B":"3","C":6"}', '{"B":"2","C":4"}']

furthermore, when I dump this in JSON I get:

payload = json.dumps(data)

I get this result:

["{\"B\":\"3\",\"C\":"6"}", "{\"B\":\"2\",\"C\":"\4"}"]

but I'm ultimately looking to remove the string on the objects and have this as the output:

[{"B":"3","C":"6"}, {"B":"2","C":"4"}]

Any guidance will be greatly appreciated.

CodePudding user response：

In your case do groupby with to_dict

out = df.groupby('A').apply(lambda x : x[['B','C']].to_dict('records')).to_frame('E').reset_index()
out
Out[198]: 
   A                                     E
0  1  [{'B': 3, 'C': 6}, {'B': 2, 'C': 4}]
1  4                    [{'B': 5, 'C': 0}]