Home > Software design >  How to explode two columns of lists with different length using pandas
How to explode two columns of lists with different length using pandas

Time:12-15

I have a dataframe with two columns of lists:

>>> import pandas as pd
>>> df = pd.DataFrame({'A': ['x1','x2','x3', 'x4'], 'B':[['v1','v2'],['v3','v4'],['v6'],['v7','v8']], 'C':[['c1','c2'],['c3','c4'],['c5','c6'],['c7']]})
>>> df
    A         B         C
0  x1  [v1, v2]  [c1, c2]
1  x2  [v3, v4]  [c3, c4]
2  x3      [v6]  [c5, c6]
3  x4  [v7, v8]      [c7]

I would like to explode columns B and C, so the output looks like this:

>>> df_exploded
    A         B         C
0  x1        v1        c1
1  x1        v2        c2
2  x2        v3        c3
3  x2        v4        c4
4  x3        v6        c5
5  x3        v6        c6
6  x4        v7        c7
7  x4        v8        c7

My current solution is to first separate rows where elements in column B and C have the same length and run df.explode(["B", "C"]) and for the rest rows, run df.explode("B") followed by df.explode("C")

I am wondering if there's a better solution. Thanks in advance!

CodePudding user response:

use itertools.zip_longest

import itertools

df1 = (df.apply(lambda x: list(itertools.zip_longest(x['B'], x['C'])), axis=1)
       .explode()
       .apply(lambda x: pd.Series(x, index=['B', 'C']))
       .groupby(level=0).ffill())

df1

    B   C
0   v1  c1
0   v2  c2
1   v3  c3
1   v4  c4
2   v6  c5
2   v6  c6
3   v7  c7
3   v8  c7



get desired output by using df1

df[['A']].join(df1)

output:

    A   B   C
0   x1  v1  c1
0   x1  v2  c2
1   x2  v3  c3
1   x2  v4  c4
2   x3  v6  c5
2   x3  v6  c6
3   x4  v7  c7
3   x4  v8  c7

if you want, you can use reset_index for index

CodePudding user response:

Yes, there is a better solution. Instead of separating the rows where the lists in columns B and C have the same length, you can use the explode method on both columns at the same time, and it will automatically take care of rows where the lists have different lengths. Here's how you can do it:

df_exploded = df.explode(["B", "C"])

This will give you the expected output:

    A         B         C
0  x1        v1        c1
1  x1        v2        c2
2  x2        v3        c3
3  x2        v4        c4
4  x3        v6        c5
5  x3        v6        c6
6  x4        v7        c7
7  x4        v8        c7
  • Related