You can even give alternative option than explode().
DataFrame:
A = ['ABC123', 'ABC124', 'ABC125', 'ABC126', 'ABC127']
B = ['ABC12', 'ABC13', 'ABC14', 'ABC15', 'ABC15']
C = ['aa, AA', 'bb', 'cc', 'dd', 'ee, EE']
D = [20, 30, 50, 54, 58]
df = pd.DataFrame({'A':A, 'B':B, 'C':C, 'D':D})
using explode, I am trying to make seprate row for the column 'C' values. i.e, 'aa, AA'
df['C'] = df['C'].apply(lambda x: x.split(","))
df = df.explode(['C'])
and after explode DF something looking like this:
A B C D
0 ABC123 ABC12 aa 20
0 ABC123 ABC12 AA 20
1 ABC124 ABC13 bb 30
2 ABC125 ABC14 cc 50
3 ABC126 ABC15 dd 54
4 ABC127 ABC15 ee 58
4 ABC127 ABC15 EE 58
Issue I wish duplicated all column values expect column 'D', instead give 0
Desire output
A B C D
0 ABC123 ABC12 aa 20
0 ABC123 ABC12 AA 00
1 ABC124 ABC13 bb 30
2 ABC125 ABC14 cc 50
3 ABC126 ABC15 dd 54
4 ABC127 ABC15 ee 58
4 ABC127 ABC15 EE 00
CodePudding user response:
Although not impossible, it would be less practical/efficient to prevent exploding of D, instead you can post-process the DataFrame to correct the values after explode
.
Take advantage of the exploded index that will be duplicated to perform boolean indexing:
df.loc[df.index.duplicated(), 'D'] = 0
output:
A B C D
0 ABC123 ABC12 aa 20
0 ABC123 ABC12 AA 0
1 ABC124 ABC13 bb 30
2 ABC125 ABC14 cc 50
3 ABC126 ABC15 dd 54
4 ABC127 ABC15 ee 58
4 ABC127 ABC15 EE 0