My dataframe is like:
df = pd.DataFrame({'a':[1,2,3], 'b':["{'c':1}", "{'d':3}", "{'c':5, 'd':6}"]})
Expected output:
a c d
0 1 1.0 NaN
1 2 NaN 3.0
2 3 5.0 6.0
Working solution would be df['b'].apply(pd.Series)
but this is not working as the b column is string but not dict. I am not defining the column structure, so can't tweak that.
CodePudding user response:
Use ast.literal_eval
for this and then as you already stated apply pd.Series with a join.
import pandas as pd
import ast
df = pd.DataFrame({'a':[1,2,3], 'b':["{'c':1}", "{'d':3}", "{'c':5, 'd':6}"]})
#convert strings to actual dictionaries
df['b'] = df['b'].apply(ast.literal_eval)
#expand data to seperate columns
out = df.join(df.pop('b').apply(pd.Series))
print(out)
a c d
0 1 1.0 NaN
1 2 NaN 3.0
2 3 5.0 6.0
CodePudding user response:
Use ast.literal_eval and pandas.concat like this:
import ast
pd.concat([df.a, df.b.apply(ast.literal_eval).apply(pd.Series)], axis=1)
CodePudding user response:
What if we do the transformation in parts? Say we have these data:
data = {'a':[1,2,3], 'b':["{'c':1}", "{'d':3}", "{'c':5, 'd':6}"]}
But instead of converting them into a data frame all together, we do it separately. data['a']
is gonna be an easy part:
A = pd.Series(data['a']).rename('a')
Now data['b']
. I like literal_eval
as a nice and obvious way to handle this. But just for fun, why not to look at this as json records?
B = pd.read_json(f"[{','.join(data['b'])}]".replace("'",'"'))
And the final touch:
df = pd.concat([A, B], axis=1)