Home > database >  How to change values for following duplicated rows by a ID - Python
How to change values for following duplicated rows by a ID - Python

Time:10-11

I am trying to find a solution that if the ID appears duplicated goes to another column and only keeps, the first value, and transform the others into 0.

what I have

test_2.query('FIX_VALUE != 0').head(10)

load_id_VR  ITEM_MEANING    LINE_COST   FIX_VALUE
672 19082022228668VR00500982    MILK RUN    452.46  -0.01
673 19082022228668VR00500982    STOP SUP MILK   34.57   -0.01
674 19082022228668VR00500982    SURTAXE GASOIL  60.90   -0.01
982 19082022280448VR00683048    EMPTIES LEG1    28.24   0.01
983 19082022280448VR00683048    MOBILITY PACKAGE    3.39    0.01
984 19082022280448VR00683048    SURTAXE GASOIL  2.81    0.01
991 19082022280448VR00683044    EMPTIES LEG1    13.18   0.01
992 19082022280448VR00683044    MOBILITY PACKAGE    1.58    0.01
993 19082022280448VR00683044    SURTAXE GASOIL  1.31    0.01
1033    1908202228144810027446  MATERIAL LEG2   62.99   -0.01

what I trying to reach


load_id_VR  ITEM_MEANING    LINE_COST   FIX_VALUE
672 19082022228668VR00500982    MILK RUN    452.46  -0.01
673 19082022228668VR00500982    STOP SUP MILK   34.57   0.00
674 19082022228668VR00500982    SURTAXE GASOIL  60.90   0.00
982 19082022280448VR00683048    EMPTIES LEG1    28.24   0.01
983 19082022280448VR00683048    MOBILITY PACKAGE    3.39    0.00
984 19082022280448VR00683048    SURTAXE GASOIL  2.81    0.00
991 19082022280448VR00683044    EMPTIES LEG1    13.18   0.01
992 19082022280448VR00683044    MOBILITY PACKAGE    1.58    0.00
993 19082022280448VR00683044    SURTAXE GASOIL  1.31    0.00
1033    1908202228144810027446  MATERIAL LEG2   62.99   -0.01

does it exist a simple way to do this? Or I need to do something like this:

for i in range(0, len(df)):
    if (df['load_id_VR'].iloc[i] == df['load_id_VR'].iloc[i 1]):
        df['FIX_VALUE'].iloc[i 1] = 0.00

Didn't test the loop, wanted to avoid a loop here, if it would exist some command in Pandas our Numpy that would do this straight forwardly.

CodePudding user response:

here is one way to do it

# using mask, assign the value of zero where load_id_vr is duplicated
df['FIX_VALUE']=df['FIX_VALUE'].mask(df.duplicated(subset='load_id_VR'),0)
df

OR using LOC

df.loc[df.duplicated(subset='load_id_VR'), 'FIX_VALUE'] = 0
df
load_id_VR  ITEM_MEANING    LINE_COST   FIX_VALUE
672     19082022228668VR00500982    MILK RUN    452.46  -0.01
673     19082022228668VR00500982    STOP SUP MILK   34.57   0.00
674     19082022228668VR00500982    SURTAXE GASOIL  60.90   0.00
982     19082022280448VR00683048    EMPTIES LEG1    28.24   0.01
983     19082022280448VR00683048    MOBILITY PACKAGE    3.39    0.00
984     19082022280448VR00683048    SURTAXE GASOIL  2.81    0.00
991     19082022280448VR00683044    EMPTIES LEG1    13.18   0.01
992     19082022280448VR00683044    MOBILITY PACKAGE    1.58    0.00
993     19082022280448VR00683044    SURTAXE GASOIL  1.31    0.00
1033    1908202228144810027446  MATERIAL LEG2   62.99   -0.01
  • Related