Create a new column to track monthly changes in employees salary and position over time in dataframe-CodePudding

I have a big dataframe indexed by date that contains info from employees:

DATE             USER_ID            POSITION_ID           SALARY
2019-12-01       101                A1                   1000
2019-12-01       103                A2                   500
2019-12-01       105                C2                   15000
2020-01-01       101                B1                   1100
2020-01-01       103                A2                   500
2019-12-01       105                C3                   15000

I want to create a new column that monthly traces whether an employee got a promotion or just simply changed of position. Something like that:

DATE             USER_ID            POSITION_ID           SALARY      EMPLOYEE_MOVEMENT
2019-12-01       101                A1                   1000         NONE
2019-12-01       103                A2                   500          NONE
2019-12-01       105                C2                   15000        NONE
2020-01-01       101                B1                   1100         PROMOTION
2020-01-01       103                A2                   500          NONE
2020-01-01       105                C2                   16000        RAISE

So far I have reached the desired output by slicing my dataframe by months, df1 contains December 2019 and df2 contains January 2021 and merging them again to create new columns for their position and salry by month.

USER_ID       DEC_POSITION_ID  DEC_SALARY      JAN_POSITION_ID   JAN_SALARY
101           A1               1000            B1                  1100    
103           A2               500             A2                  500     
105           C2               16000           C2                  15000

Then I am able to make a comparison between the two dataframes:

conditions=[(df["JAN_POSITION_ID"] == df["DEC_POSITION_ID"]) & (df["JAN_SALARY"] == df["DEC_SALARY"]),
(df["JAN_POSITION_ID"] != df["DEC_POSITION_ID"]) & (df["JAN_SALARY"] > df["DEC_SALARY"]),
(df["JAN_POSITION_ID"] == df["DEC_POSITION_ID"]) & (df["JAN_SALARY"] > df["DEC_SALARY"])]

values=["NONE", "PROMOTION", "RAISE"]

df["MOVEMENT"] = np.select(conditions, values)

Then I get the following output:

USER_ID       DEC_POSITION_ID  DEC_SALARY      JAN_POSITION_ID   JAN_SALARY        MOVEMENT
101           A1               1000            B1                  1100            PROMOTION
103           A2               500             A2                  500              NONE
105           C2               16000           C2                  15000            RAISE

But since it is a huge data set (three years of info) I want to trace the monthy movements (each month compared to the previous one) of employees without slicing my dataframe and get something like I have already mentioned:

DATE             USER_ID            POSITION_ID           SALARY      MOVEMENT
2019-12-01       101                A1                   1000         NONE
2019-12-01       103                A2                   500          NONE
2019-12-01       105                C2                   15000        NONE
2020-01-01       101                B1                   1100         PROMOTION
2020-01-01       103                A2                   500          NONE
2020-01-01       105                C2                   16000        RAISE

Any thoughts on that? Thank you so much for your help!

CodePudding user response：

Here's a solution but it only works if you have every combination observation of month/user-id

import pandas as pd

#Test table
df = pd.DataFrame({
    'DATE': ['2019-12-01','2019-12-01','2019-12-01','2020-01-01','2020-01-01','2020-01-01'],
    'USER_ID': [101, 103, 105, 101, 103, 105],
    'POSITION_ID': ['A1', 'A2', 'C2', 'B1', 'A2', 'C3'],
    'SALARY': [1000, 500, 15000, 1100, 500, 15000]
})
df['DATE'] = pd.to_datetime(df['DATE'])


#Groupby and shift to compare the position and salary month to month
prev_salary = df.groupby('USER_ID')['SALARY'].shift(1)
prev_position = df.groupby('USER_ID')['POSITION_ID'].shift(1)

changed_positions = prev_position.notnull() & df['POSITION_ID'].ne(prev_position)
pay_raise = prev_salary.notnull() & df['SALARY'].gt(prev_salary)

#Create the MOVEMENT column
df.loc[changed_positions & ~pay_raise,'MOVEMENT'] = 'Changed position without raise'
df.loc[changed_positions & pay_raise,'MOVEMENT'] = 'Changed position with raise'
df.loc[~changed_positions & pay_raise,'MOVEMENT'] = 'Same position with raise'

df