Home > other >  stop duplicating specific column values in explode()
stop duplicating specific column values in explode()

Time:09-03

You can even give alternative option than explode().

DataFrame:

A = ['ABC123', 'ABC124', 'ABC125', 'ABC126', 'ABC127']
B = ['ABC12', 'ABC13', 'ABC14', 'ABC15', 'ABC15']
C = ['aa, AA', 'bb', 'cc', 'dd', 'ee, EE']
D = [20, 30, 50, 54, 58]
df = pd.DataFrame({'A':A, 'B':B, 'C':C, 'D':D})

using explode, I am trying to make seprate row for the column 'C' values. i.e, 'aa, AA'

df['C'] = df['C'].apply(lambda x: x.split(","))
df = df.explode(['C'])

and after explode DF something looking like this:

        A   B       C   D
0   ABC123  ABC12   aa  20
0   ABC123  ABC12   AA  20
1   ABC124  ABC13   bb  30
2   ABC125  ABC14   cc  50
3   ABC126  ABC15   dd  54
4   ABC127  ABC15   ee  58
4   ABC127  ABC15   EE  58

Issue I wish duplicated all column values expect column 'D', instead give 0

Desire output

        A   B       C   D
0   ABC123  ABC12   aa  20
0   ABC123  ABC12   AA  00
1   ABC124  ABC13   bb  30
2   ABC125  ABC14   cc  50
3   ABC126  ABC15   dd  54
4   ABC127  ABC15   ee  58
4   ABC127  ABC15   EE  00

CodePudding user response:

Although not impossible, it would be less practical/efficient to prevent exploding of D, instead you can post-process the DataFrame to correct the values after explode.

Take advantage of the exploded index that will be duplicated to perform boolean indexing:

df.loc[df.index.duplicated(), 'D'] = 0

output:

        A      B    C   D
0  ABC123  ABC12   aa  20
0  ABC123  ABC12   AA   0
1  ABC124  ABC13   bb  30
2  ABC125  ABC14   cc  50
3  ABC126  ABC15   dd  54
4  ABC127  ABC15   ee  58
4  ABC127  ABC15   EE   0
  • Related