Home > other >  Extract values entail in list of lists in dataframe columns
Extract values entail in list of lists in dataframe columns

Time:12-12

I have the following dataset:

df = pd.DataFrame({'sentence':["sentence1", "sentence2"],
'Parent': [[['x', 'HackOrg'], ['xx', 'Purpose'], ['xxx', 'Area'], ['xxxx', 'HackOrg']], [['xxxxx', 'Exp'], ['xxxxxx', 'Idus'], ['xxxxxxx', 'Area'], ['xxxxxxxx', 'Area']]]
})


sentence    Parent
0   sentence1   [[x, HackOrg], [xx, Purpose], [xxx, Area], [xx...
1   sentence2   [[xxxxx, Exp], [xxxxxx, Idus], [xxxxxxx, Area]...

I want to come to the following output

   sentence   HackOrg  Purpose    Area    HackOrg  Exp    Idus   Area     Area
o  sentence1  x        xx         xxx     xxxx 
1  sentence2                                       xxxxx xxxxxx  xxxxxxx xxxxxxxx

Any ideas?

CodePudding user response:

Try:

from itertools import count

c = count()
df.Parent = df.Parent.apply(lambda x: {f"{b}.{next(c)}": a for a, b in x})
df = pd.concat([df, df.pop("Parent").apply(pd.Series)], axis=1).fillna("")

df.columns = df.columns.str.replace(r"\.\d $", "", regex=True)

print(df)

Prints:

    sentence HackOrg Purpose Area HackOrg    Exp    Idus     Area      Area
0  sentence1       x      xx  xxx    xxxx                                  
1  sentence2                               xxxxx  xxxxxx  xxxxxxx  xxxxxxxx

CodePudding user response:

Using reshaping:

s = df['Parent'].explode()

out = (pd
 .DataFrame(s.tolist(), index=s.index)
 .reset_index().reset_index()
 .pivot_table(index='index', columns=['level_0', 1], values=0, aggfunc='first')
 .droplevel('level_0', axis=1).rename_axis(index=None, columns=None)
)

Output:

  HackOrg Purpose Area HackOrg    Exp    Idus     Area      Area
0       x      xx  xxx    xxxx    NaN     NaN      NaN       NaN
1     NaN     NaN  NaN     NaN  xxxxx  xxxxxx  xxxxxxx  xxxxxxxx
  • Related