Home > front end >  how to use pandas function with condition, here are using .explode as example
how to use pandas function with condition, here are using .explode as example

Time:05-25

My df:

C1     C2       C3
 A   True  [a,b,c]
 A  False  [a,b,c]
 B   True  [a,b,c]
 B  False  [a,b,c]

What I want is, only apply .explode on C3 when C1 == A and C2 == True. How can I get to my expected result:

C1     C2       C3
 A   True        a
 A   True        b
 A   True        c
 A  False  [a,b,c]
 B   True  [a,b,c]
 B  False  [a,b,c]

CodePudding user response:

You can select the rows to be exploded and concat it with other rows.

m = df['C2'] & df['C1'].eq('A')

out = pd.concat([df[m].explode(['C3']), df[~m]])
print(out)

  C1     C2         C3
0  A   True          a
0  A   True          b
0  A   True          c
1  A  False  [a, b, c]
2  B   True  [a, b, c]
3  B  False  [a, b, c]

Since we select the rows to be exploded first, if you want to keep the original order, you can

out = (pd.concat([df[m].explode(['C3']), df[~m]])
       .sort_index()
       .reset_index(drop=True))
print(df)
  C1     C2             C3
0  A   True  ['a','b','c']
1  A  False  ['a','b','c']
2  B   True  ['a','b','c']
3  B  False  ['a','b','c']
4  A   True  ['a','b','c']

print(out)
  C1     C2         C3
0  A   True          a
1  A   True          b
2  A   True          c
3  A  False  [a, b, c]
4  B   True  [a, b, c]
5  B  False  [a, b, c]
6  A   True          a
7  A   True          b
8  A   True          c

If you don't care about the order

out = pd.concat([df[m].explode(['C3']), df[~m]], ignore_index=True)
print(out)

  C1     C2         C3
0  A   True          a
1  A   True          b
2  A   True          c
3  A   True          a
4  A   True          b
5  A   True          c
6  A  False  [a, b, c]
7  B   True  [a, b, c]
8  B  False  [a, b, c]

CodePudding user response:

Another approach, you can mask the values that don't find the criteria (with where), explode and combine_first:

m = df['C2'] & df['C1'].eq('A')
out = df.where(m).explode('C3').combine_first(df)

Or, probably more efficient but less concise:

out = df.assign(C3=df['C3'].where(m)).explode('C3').fillna({'C3': df['C3']})

output:

  C1     C2         C3
0  A   True          a
0  A   True          b
0  A   True          c
1  A  False  [a, b, c]
2  B   True  [a, b, c]
3  B  False  [a, b, c]
  • Related