I have a df
sorted by EmpID
and Date
. Now I need to update the Reason
column of the first occurrence for each EmpID
with the value 100.
Original:
EmpID | Date | Reason |
---|---|---|
123 | 01/01/2022 | 0 |
123 | 01/02/2022 | 500 |
123 | 01/03/2022 | 0 |
124 | 01/01/2022 | 0 |
124 | 01/02/2022 | 800 |
124 | 01/03/2022 | 0 |
Result:
EmpID | Date | Reason |
---|---|---|
123 | 01/01/2022 | 100 |
123 | 01/02/2022 | 500 |
123 | 01/03/2022 | 0 |
124 | 01/01/2022 | 100 |
124 | 01/02/2022 | 800 |
124 | 01/03/2022 | 0 |
I'm not getting close in any attempts. Would it be easier to update the min date of each EmpID
and set Reason
to 100?
I have tried the following but received an error.
DF.loc[DF.groupby(['EmpID','Date'])['Reason'].head(1), 'Reason'] = '100'
This raises:
KeyError: "None of [Float64Index([ 0.0, 800.0, 0.0, 800.0, 0.0, 800.0, 0.0, 800.0, 0.0,\n 800.0,\n ...\n 800.0, 0.0, 100.0, 800.0, 0.0, 100.0, 800.0, 0.0, 100.0,\n 800.0],\n dtype='float64', name='EEID', length=12769)] are in the [index]"
Thank you for any help!
CodePudding user response:
Check duplicated
df.loc[~df['EmpID'].duplicated(),'Reason'] = 100
CodePudding user response:
You can use .transform
. Just define a function that returns a mask: with all the values False
except for the first one: True
.
With the combined mask of all the groups you can then select and replace the elements that you need:
def f(s):
mask = np.full(len(s), False)
mask[0] = True
return mask
mask = df.groupby('EmpID')['Reason'].transform(f)
df.loc[mask, 'Reason'] = 100
Result:
EmpID Date Reason
0 123 2022-01-01 100
1 123 2022-01-02 500
2 123 2022-01-03 0
3 124 2022-01-01 100
4 124 2022-01-02 800
5 124 2022-01-03 0