I have the following dataset:
df = pd.DataFrame({'sentence':["sentence1", "sentence2"],
'Parent': [[['x', 'HackOrg'], ['xx', 'Purpose'], ['xxx', 'Area'], ['xxxx', 'HackOrg']], [['xxxxx', 'Exp'], ['xxxxxx', 'Idus'], ['xxxxxxx', 'Area'], ['xxxxxxxx', 'Area']]]
})
sentence Parent
0 sentence1 [[x, HackOrg], [xx, Purpose], [xxx, Area], [xx...
1 sentence2 [[xxxxx, Exp], [xxxxxx, Idus], [xxxxxxx, Area]...
I want to come to the following output
sentence HackOrg Purpose Area HackOrg Exp Idus Area Area
o sentence1 x xx xxx xxxx
1 sentence2 xxxxx xxxxxx xxxxxxx xxxxxxxx
Any ideas?
CodePudding user response:
Try:
from itertools import count
c = count()
df.Parent = df.Parent.apply(lambda x: {f"{b}.{next(c)}": a for a, b in x})
df = pd.concat([df, df.pop("Parent").apply(pd.Series)], axis=1).fillna("")
df.columns = df.columns.str.replace(r"\.\d $", "", regex=True)
print(df)
Prints:
sentence HackOrg Purpose Area HackOrg Exp Idus Area Area
0 sentence1 x xx xxx xxxx
1 sentence2 xxxxx xxxxxx xxxxxxx xxxxxxxx
CodePudding user response:
Using reshaping:
s = df['Parent'].explode()
out = (pd
.DataFrame(s.tolist(), index=s.index)
.reset_index().reset_index()
.pivot_table(index='index', columns=['level_0', 1], values=0, aggfunc='first')
.droplevel('level_0', axis=1).rename_axis(index=None, columns=None)
)
Output:
HackOrg Purpose Area HackOrg Exp Idus Area Area
0 x xx xxx xxxx NaN NaN NaN NaN
1 NaN NaN NaN NaN xxxxx xxxxxx xxxxxxx xxxxxxxx