I have a dataframe df
which looks something like this:
key | id |
---|---|
x | 0.6 |
x | 0.5 |
x | 0.43 |
x | 0.56 |
y | 13 |
y | 14 |
y | 0.4 |
y | 0.1 |
I'd like to replace the Last value for every key
value with 0, so that the df looks like this:
key | id |
---|---|
x | 0.6 |
x | 0.5 |
x | 0.43 |
x | 0 |
y | 13 |
y | 14 |
y | 0.4 |
y | 0 |
I've tried the following:
for i in df['key'].unique():
df.loc[df['key'] == i, 'id'].iat[-1] = 0
the problem is it does not replace the actual value in the df. What am I missing? and perhaps there's an even better (performing) way to tackle this problem.
CodePudding user response:
Use Series.duplicated
for get last value per key
and set 0
in DataFrame.loc
:
df.loc[~df['key'].duplicated(keep='last'), 'id'] = 0
print (df)
key id
0 x 0.60
1 x 0.50
2 x 0.43
3 x 0.00
4 y 13.00
5 y 14.00
6 y 0.40
7 y 0.00
How it working:
print (df.assign(mask=df['key'].duplicated(keep='last'),
invert_mask=~df['key'].duplicated(keep='last')))
key id mask invert_mask
0 x 0.60 True False
1 x 0.50 True False
2 x 0.43 True False
3 x 0.00 False True
4 y 13.00 True False
5 y 14.00 True False
6 y 0.40 True False
7 y 0.00 False True
Another solution is simply multiple id
column with boolean mask:
df['id'] = df['key'].duplicated(keep='last').mul(df['id'])
print (df)
key id
0 x 0.60
1 x 0.50
2 x 0.43
3 x 0.00
4 y 13.00
5 y 14.00
6 y 0.40
7 y 0.00
CodePudding user response:
You can use groupby.cumcount
to access the nth row per group from the end (with ascending=False
), and boolean indexing:
df.loc[df.groupby('key').cumcount(ascending=False).eq(0), 'id'] = 0
output:
key id
0 x 0.60
1 x 0.50
2 x 0.43
3 x 0.00
4 y 13.00
5 y 14.00
6 y 0.40
7 y 0.00
Intermediate:
key id cumcount eq(0)
0 x 0.60 3 False
1 x 0.50 2 False
2 x 0.43 1 False
3 x 0.56 0 True
4 y 13.00 3 False
5 y 14.00 2 False
6 y 0.40 1 False
7 y 0.10 0 True
You can easily adapt to any row, example for the second to last row per group:
df.loc[df.groupby('key').cumcount(ascending=False).eq(1), 'id'] = 0
For the third row per group:
df.loc[df.groupby('key').cumcount().eq(2), 'id'] = 0