Home > Software engineering >  Is there a way to recalculate existing values in df based on conditions? - Python / Pandas
Is there a way to recalculate existing values in df based on conditions? - Python / Pandas

Time:10-12

I have a DataFrame with Employees and their hours for different categories.

I need to recalculate only specific categories (OT, MILE and REST Categories SHOULD NOT Be Updated, ALL Other Should be updated) ONLY if OT category is present under Empl_Id.

data = {'Empl_Id': [1, 1, 1, 1, 1, 2, 2, 2, 3, 3, 3, 3, 3, 3], 
     'Category': ["MILE", "REST", "OT", "TRVL", "REG", "ADMIN", "REST", "REG", "MILE", "OT", "TRVL", "REST", "MAT", "REG"], 
     'Value': [43, 0.7, 6.33, 2.67, 52, 22, 1.17, 16.5, 73.6, 4.75, 1.33, 2.5, 5.5, 52.25]}

df = pd.DataFrame(data=data)

df
Empl_Id Category Value
1 MILE 43
1 REST 0.7
1 OT 6.33
1 TRVL 2.67
1 REG 52
2 ADMIN 22
2 REST 1.17
2 REG 16.5
3 MILE 73.6
3 OT 4.75
3 TRVL 1.33
3 REST 2.5
3 MAT 5.5
3 REG 52.25

The Logic is to:

1) Find % of OT Hours from Total Hours (OT, REST and MILE don't count):

  • 1st Empl_Id: 6.33 (OT) / 2.67 (TRVL) 52 (REG) = 6.33 / 54.67 = 11.58 %
  • 2nd Empl_Id: OT Hours Not present, nothing should be updated
  • 3rd Empl_Id: 4.75 (OT) / 1.33 (TRVL) 5.5 (MAT) 52.25 (REG) = 4.75 / 59.08 = 8.04 %

2) Substract % of OT from each category (OT, REST and MILE don't count):

Empl_Id Category Value
1 MILE 43
1 REST 0.7
1 OT 6.33
1 TRVL 2.67 - 11.58 % (0.31) = 2.36
1 REG 52 - 11.58 % (6.02) = 45.98
2 ADMIN 22
2 REST 1.17
2 REG 16.5
3 MILE 73.6
3 OT 4.75
3 TRVL 1.33 - 8.04 % (0.11) = 1.22
3 REST 2.5
3 MAT 5.5 - 8.04 % (0.44) = 5.06
3 REG 52.25 - 8.04 % (4.2) = 48.05

CodePudding user response:

You can use:

keep = ['OT', 'MILE', 'REST']

# get factor
factor = (df.groupby(df['Empl_Id'])
            .apply(lambda g: g.loc[g['Category'].eq('OT'),'Value'].sum()
                            /g.loc[~g['Category'].isin(keep),'Value'].sum()
                  )
            .rsub(1)
         )

# update
df.loc[~df['Category'].isin(keep), 'Value'] *= df['Empl_Id'].map(factor)

output:

    Empl_Id Category      Value
0         1     MILE  43.000000
1         1     REST   0.700000
2         1       OT   6.330000
3         1     TRVL   2.360852
4         1      REG  45.979148
5         2    ADMIN  22.000000
6         2     REST   1.170000
7         2      REG  16.500000
8         3     MILE  73.600000
9         3       OT   1.750000
10        3     TRVL   1.290604
11        3     REST   2.500000
12        3      MAT   5.337085
13        3      REG  50.702310
  • Related