How to unwind a column in Pandas dataframe-CodePudding

My dataframe is like:

df = pd.DataFrame({'a':[1,2,3], 'b':["{'c':1}", "{'d':3}", "{'c':5, 'd':6}"]})

Expected output:

   a    c    d
0  1  1.0  NaN
1  2  NaN  3.0
2  3  5.0  6.0

Working solution would be df['b'].apply(pd.Series) but this is not working as the b column is string but not dict. I am not defining the column structure, so can't tweak that.

CodePudding user response：

Use ast.literal_eval for this and then as you already stated apply pd.Series with a join.

import pandas as pd
import ast

df = pd.DataFrame({'a':[1,2,3], 'b':["{'c':1}", "{'d':3}", "{'c':5, 'd':6}"]})

#convert strings to actual dictionaries
df['b'] = df['b'].apply(ast.literal_eval)

#expand data to seperate columns
out = df.join(df.pop('b').apply(pd.Series))
print(out)

   a    c    d
0  1  1.0  NaN
1  2  NaN  3.0
2  3  5.0  6.0

CodePudding user response：

Use ast.literal_eval and pandas.concat like this:

import ast
pd.concat([df.a, df.b.apply(ast.literal_eval).apply(pd.Series)], axis=1)

CodePudding user response：

What if we do the transformation in parts? Say we have these data:

data = {'a':[1,2,3], 'b':["{'c':1}", "{'d':3}", "{'c':5, 'd':6}"]}

But instead of converting them into a data frame all together, we do it separately. data['a'] is gonna be an easy part:

A = pd.Series(data['a']).rename('a')

Now data['b']. I like literal_eval as a nice and obvious way to handle this. But just for fun, why not to look at this as json records?

B = pd.read_json(f"[{','.join(data['b'])}]".replace("'",'"'))

And the final touch:

df = pd.concat([A, B], axis=1)