Existing df :
Id status value
A1 clear 23
A1 in-process 50
A1 done 20
B1 start 2
B1 end 30
Expected df :
Id status value
A1 clear 0
A1 in-process 50
A1 done 20
B1 start 0
B1 end 30
looking to replace first value of each group with 0
CodePudding user response:
Use Series.duplicated
for duplicated values, set first duplicate by inverse mask by ~
with DataFrame.loc
:
df.loc[~df['Id'].duplicated(), 'value'] = 0
print (df)
Id status value
0 A1 clear 0
1 A1 in-process 50
2 A1 done 20
3 B1 start 0
4 B1 end 30
CodePudding user response:
One approach could be as follows:
- Compare the values for each row in
df.Id
with the next row, combiningSeries.shift
withSeries.ne
. This will return a booleanSeries
withTrue
for each first row of a newId
value. - Next, use
df.loc
to select only rows withTrue
for columnvalue
and assign0
.
df.loc[df.Id.ne(df.Id.shift()), 'value'] = 0
print(df)
Id status value
0 A1 clear 0
1 A1 in-process 50
2 A1 done 20
3 B1 start 0
4 B1 end 30
N.B. this approach assumes that the "groups" in Id
are sorted (as they seem to be, indeed). If this is not the case, you could use df.sort_values('Id', inplace=True)
first, but if that is necessary, the answer by @jezrael will be faster, surely.