Hi I'm struggling to unstack this nested list in pandas. Can anyone please help?
This is the dataframe
df = pd.DataFrame({"a" : [1,2], "b" : [['a','b',['c','d']],['a',['b','c']]]})
I need to have redundant rows by unstacking only the list inside the list not the whole list.
Output dataframe should be like
df2 = pd.DataFrame({"a" : [1,1,2,2], "b" : [['a','b','c'],['a','b','d'],['a','b'],['a','c']]})
Thanks in advance
CodePudding user response:
Assuming that the inner list is always the last element, you could expand the list column and then explode:
df[["a"]].join(df.b.apply(lambda x: list(x[:-1] [last_el] for last_el in x[-1])).explode())
Prior to explode
, this looks as follows:
a b
0 1 [[a, b, c], [a, b, d]]
1 2 [[a, b], [a, c]]
CodePudding user response:
You can use itertools.product
and explode
:
from itertools import product
out = (df
.assign(b=[list(map(list, product(*l))) for l in df['b']])
.explode('b')
)
Output:
a b
0 1 [a, b, c]
0 1 [a, b, d]
1 2 [a, b]
1 2 [a, c]
CodePudding user response:
You could process your DataFrame series with something that unpacks your list objects, like:
def to_1D(series):
return pd.Series([x for _list in series for x in _list])
Have a look at this Dealing with List Values in Pandas Dataframes
Of course, as mentioned in the followups to your question, this could vary if you need this processing to be flexible to different depth of the nested lists.