Home > Enterprise >  Replace the first value of each group
Replace the first value of each group

Time:08-30

I have a df sorted by EmpID and Date. Now I need to update the Reason column of the first occurrence for each EmpID with the value 100.

Original:

EmpID Date Reason
123 01/01/2022 0
123 01/02/2022 500
123 01/03/2022 0
124 01/01/2022 0
124 01/02/2022 800
124 01/03/2022 0

Result:

EmpID Date Reason
123 01/01/2022 100
123 01/02/2022 500
123 01/03/2022 0
124 01/01/2022 100
124 01/02/2022 800
124 01/03/2022 0

I'm not getting close in any attempts. Would it be easier to update the min date of each EmpID and set Reason to 100?

I have tried the following but received an error.

DF.loc[DF.groupby(['EmpID','Date'])['Reason'].head(1), 'Reason'] = '100'

This raises:

KeyError: "None of [Float64Index([ 0.0, 800.0, 0.0, 800.0, 0.0, 800.0, 0.0, 800.0, 0.0,\n 800.0,\n ...\n 800.0, 0.0, 100.0, 800.0, 0.0, 100.0, 800.0, 0.0, 100.0,\n 800.0],\n dtype='float64', name='EEID', length=12769)] are in the [index]"

Thank you for any help!

CodePudding user response:

Check duplicated

df.loc[~df['EmpID'].duplicated(),'Reason'] = 100

CodePudding user response:

You can use .transform. Just define a function that returns a mask: with all the values False except for the first one: True.

With the combined mask of all the groups you can then select and replace the elements that you need:

def f(s):
    mask = np.full(len(s), False)
    mask[0] = True
    return mask

mask = df.groupby('EmpID')['Reason'].transform(f)
df.loc[mask, 'Reason'] = 100

Result:

   EmpID       Date  Reason
0    123 2022-01-01     100
1    123 2022-01-02     500
2    123 2022-01-03       0
3    124 2022-01-01     100
4    124 2022-01-02     800
5    124 2022-01-03       0
  • Related