Home > Back-end >  replacing first row of selected column from each group with 0
replacing first row of selected column from each group with 0

Time:10-15

Existing df :

Id      status      value

A1       clear        23
A1       in-process   50
A1       done         20
B1       start        2
B1        end         30

Expected df :

Id      status      value

A1       clear        0
A1       in-process   50
A1       done         20
B1       start        0
B1        end         30

looking to replace first value of each group with 0

CodePudding user response:

Use Series.duplicated for duplicated values, set first duplicate by inverse mask by ~ with DataFrame.loc:

df.loc[~df['Id'].duplicated(), 'value'] = 0
print (df)
   Id      status  value
0  A1       clear      0
1  A1  in-process     50
2  A1        done     20
3  B1       start      0
4  B1         end     30

CodePudding user response:

One approach could be as follows:

  • Compare the values for each row in df.Id with the next row, combining Series.shift with Series.ne. This will return a boolean Series with True for each first row of a new Id value.
  • Next, use df.loc to select only rows with True for column value and assign 0.
df.loc[df.Id.ne(df.Id.shift()), 'value'] = 0

print(df)

   Id      status  value
0  A1       clear      0
1  A1  in-process     50
2  A1        done     20
3  B1       start      0
4  B1         end     30

N.B. this approach assumes that the "groups" in Id are sorted (as they seem to be, indeed). If this is not the case, you could use df.sort_values('Id', inplace=True) first, but if that is necessary, the answer by @jezrael will be faster, surely.

  • Related