I have the following dataframe that contains users ids over time (months):
DATE USER_ID ACTIVITY
2021-06-01 A1 1
2021-06-01 B2 1
2021-06-01 C3 0
2021-07-01 A1 1
2021-07-01 B2 0
2021-07-01 C3 1
2021-08-01 A1 1
2021-08-01 B2 0
2021-08-01 C3 0
2021-09-01 A1 1
2021-09-01 B2 1
2021-09-01 C3 1
The values of the "ACTIVE" column just show 1 (the user is active) and 0 (user not active).
I want to create a new column that monthly traces whether the employee id got activity based on the following conditions:
*if last month id was 0 and current id is 0 then the label is "vacancy stock"
*if last month id was 0 and current id is 1 then the label is "new active"
*if last month id was 1 and current id is 0 then the label is "new vacancy"
*if last month id was 1 and current id is 1 then the label is "active stock"
Considering those conditions, this is the expected dataframe with the new column:
DATE USER_ID ACTIVITY NEW_COLUMN
2021-06-01 A1 1 NaN
2021-06-01 B2 1 NaN
2021-06-01 C3 0 NaN
2021-07-01 A1 1 active stock
2021-07-01 B2 0 new vacancy
2021-07-01 C3 1 new active
2021-08-01 A1 1 active stock
2021-08-01 B2 0 vacancy stock
2021-08-01 C3 0 new vacancy
2021-09-01 A1 1 active stock
2021-09-01 B2 1 new active
2021-09-01 C3 1 new active
Thanks in advance for your help!
CodePudding user response:
You can use groupby.shift
to access the previous period per user and combine the status on a binary-like encoding:
d = {0: 'vacancy stock', 1: 'new active', 10: 'new vacancy', 11: 'active stock'}
df['NEW_COLUMN'] = (df['ACTIVITY']
.add(df.groupby('USER_ID')['ACTIVITY'].shift().mul(10))
.map(d)
)
Output:
DATE USER_ID ACTIVITY NEW_COLUMN
0 2021-06-01 A1 1 NaN
1 2021-06-01 B2 1 NaN
2 2021-06-01 C3 0 NaN
3 2021-07-01 A1 1 active stock
4 2021-07-01 B2 0 new vacancy
5 2021-07-01 C3 1 new active
6 2021-08-01 A1 1 active stock
7 2021-08-01 B2 0 vacancy stock
8 2021-08-01 C3 0 new vacancy
9 2021-09-01 A1 1 active stock
10 2021-09-01 B2 1 new active
11 2021-09-01 C3 1 new active
NB. For a real binary encoding you would multiply by 2 and use the dictionary: d = {0: 'vacancy stock', 1: 'new active', 2: 'new vacancy', 3: 'active stock'}
.