I have a dataframe which has a column containing multiple values, separated by ",".
id data
0 {'1':A, '2':B, '3':C}
1 {'1':A}
2 {'0':0}
How can I split up the keys-values of 'data' column and make a new column for each key values present in it, without removing the original 'data' column.
desired output.
id data 1 2 3 0
0 {'1':A, '2':B, '3':C} A B C Nan
1 {'1':A} A Nan Nan Nan
2 {'0':0} Nan Nan Nan 0
Thank you in advance :).
CodePudding user response:
You'll need a regular expression to convert the data into a format that can be parsed as JSON. Then, pd.json_normalize
will do the job nicely:
df['data'] = df['data'].str.replace(r'(["\'])\s*:(. ?)\s*(,?\s*["\'}])', '\\1:\'\\2\'\\3', regex=True)
import ast
df['data'] = df['data'].apply(ast.literal_eval)
df = pd.concat([df, pd.json_normalize(df['data'])], axis=1)
Output:
>>> df
data 1 2 3 0
0 {'1': 'A', '2': 'B', '3': 'C'} A B C NaN
1 {'1': 'A'} A NaN NaN NaN
2 {'0': '0'} NaN NaN NaN 0