Home > front end >  How to unstack only the list inside a list of pandas dataframe?
How to unstack only the list inside a list of pandas dataframe?

Time:10-05

Hi I'm struggling to unstack this nested list in pandas. Can anyone please help?

This is the dataframe

df = pd.DataFrame({"a" : [1,2], "b" : [['a','b',['c','d']],['a',['b','c']]]})

I need to have redundant rows by unstacking only the list inside the list not the whole list.

Output dataframe should be like

df2 = pd.DataFrame({"a" : [1,1,2,2], "b" : [['a','b','c'],['a','b','d'],['a','b'],['a','c']]})

Thanks in advance

CodePudding user response:

Assuming that the inner list is always the last element, you could expand the list column and then explode:

df[["a"]].join(df.b.apply(lambda x: list(x[:-1]   [last_el] for last_el in x[-1])).explode())

Prior to explode, this looks as follows:

   a                       b
0  1  [[a, b, c], [a, b, d]]
1  2        [[a, b], [a, c]]

CodePudding user response:

You can use itertools.product and explode:

from itertools import product

out = (df
  .assign(b=[list(map(list, product(*l))) for l in df['b']])
  .explode('b')
 )

Output:

   a          b
0  1  [a, b, c]
0  1  [a, b, d]
1  2     [a, b]
1  2     [a, c]

CodePudding user response:

You could process your DataFrame series with something that unpacks your list objects, like:

def to_1D(series):
    return pd.Series([x for _list in series for x in _list])

Have a look at this Dealing with List Values in Pandas Dataframes

Of course, as mentioned in the followups to your question, this could vary if you need this processing to be flexible to different depth of the nested lists.

  • Related