Home > Enterprise >  Replacing the last row value of a specific column value
Replacing the last row value of a specific column value

Time:11-22

I have a dataframe df which looks something like this:

key id
x 0.6
x 0.5
x 0.43
x 0.56
y 13
y 14
y 0.4
y 0.1

I'd like to replace the Last value for every key value with 0, so that the df looks like this:

key id
x 0.6
x 0.5
x 0.43
x 0
y 13
y 14
y 0.4
y 0

I've tried the following:

for i in df['key'].unique():
   df.loc[df['key'] == i, 'id'].iat[-1] = 0

the problem is it does not replace the actual value in the df. What am I missing? and perhaps there's an even better (performing) way to tackle this problem.

CodePudding user response:

Use Series.duplicated for get last value per key and set 0 in DataFrame.loc:

df.loc[~df['key'].duplicated(keep='last'), 'id'] = 0

print (df)
  key     id
0   x   0.60
1   x   0.50
2   x   0.43
3   x   0.00
4   y  13.00
5   y  14.00
6   y   0.40
7   y   0.00

How it working:

print (df.assign(mask=df['key'].duplicated(keep='last'),
                 invert_mask=~df['key'].duplicated(keep='last')))
  key     id   mask  invert_mask
0   x   0.60   True        False
1   x   0.50   True        False
2   x   0.43   True        False
3   x   0.00  False         True
4   y  13.00   True        False
5   y  14.00   True        False
6   y   0.40   True        False
7   y   0.00  False         True

Another solution is simply multiple id column with boolean mask:

df['id'] = df['key'].duplicated(keep='last').mul(df['id'])
print (df)
  key     id
0   x   0.60
1   x   0.50
2   x   0.43
3   x   0.00
4   y  13.00
5   y  14.00
6   y   0.40
7   y   0.00

CodePudding user response:

You can use groupby.cumcount to access the nth row per group from the end (with ascending=False), and boolean indexing:

df.loc[df.groupby('key').cumcount(ascending=False).eq(0), 'id'] = 0

output:

  key     id
0   x   0.60
1   x   0.50
2   x   0.43
3   x   0.00
4   y  13.00
5   y  14.00
6   y   0.40
7   y   0.00

Intermediate:

  key     id  cumcount  eq(0)
0   x   0.60         3  False
1   x   0.50         2  False
2   x   0.43         1  False
3   x   0.56         0   True
4   y  13.00         3  False
5   y  14.00         2  False
6   y   0.40         1  False
7   y   0.10         0   True

You can easily adapt to any row, example for the second to last row per group:

df.loc[df.groupby('key').cumcount(ascending=False).eq(1), 'id'] = 0

For the third row per group:

df.loc[df.groupby('key').cumcount().eq(2), 'id'] = 0
  • Related