I work with Pandas and I am trying to create a column where the value is increased and especially reset by condition based on the Time column Input data:
Out[73]:
ID Time Job Level Counter
0 1 17 a
1 1 18 a
2 1 19 a
3 1 20 a
4 1 21 a
5 1 22 b
6 1 23. b
7 1 24. b
8 2 10. a
9 2 11 a
10 2 12 a
11 2 13 a
12 2 14. b
13 2 15 b
14 2 16 b
15 2 17 c
16 2 18 c
I want to create a new vector 'count' where the value within each ID group remains the same before the first change and start from zero every time a change in the Job level is encountered while remains equal to Time before the first change or no change.
What I would like to have:
ID Time Job Level Counter
0 1 17 a 17
1 1 18 a 18
2 1 19 a 19
3 1 20 a 20
4 1 21 a 21
5 1 22 b 0
6 1 23 b 1
7 1 24 b 2
8 2 10 a 10
9 2 11 a 11
10 2 12 a 12
11 2 13 a 13
12 2 14 b 0
13 2 15 b 1
14 2 16 b 2
15 2 17 c 0
16 2 18 c 1
This is what I tried
df = df.sort_values(['ID']).reset_index(drop=True)
df['Counter'] = promo_details.groupby('ID')['job_level'].apply(lambda x: x.shift()!=x)
def func(group):
group.loc[group.index[0],'Counter']=group.loc[group.index[0],'time_in_level']
return group
df = df.groupby('emp_id').apply(func)
df['Counter'] = df['Counter'].replace(True,'a')
df['Counter'] = np.where(df.Counter == False,df['Time'],df['Counter'])
df['Counter'] = df['Counter'].replace('a',0)
This is not creating a cumulative change after the first change while preserving counts before it,
CodePudding user response:
Use GroupBy.cumcount
for counter with filter first group - there is added values from column Time
:
#if need test consecutive duplicates
s = df['Job Level'].ne(df['Job Level'].shift()).cumsum()
m = s.groupby(df['ID']).transform('first').eq(s)
df['Counter'] = np.where(m, df['Time'], df.groupby(['ID', s]).cumcount())
print (df)
ID Time Job Level Counter
0 1 17 a 17
1 1 18 a 18
2 1 19 a 19
3 1 20 a 20
4 1 21 a 21
5 1 22 b 0
6 1 23 b 1
7 1 24 b 2
8 2 10 a 10
9 2 11 a 11
10 2 12 a 12
11 2 13 a 13
12 2 14 b 0
13 2 15 b 1
14 2 16 b 2
15 2 17 c 0
16 2 18 c 1
Or:
#if each groups are unique
m = df.groupby('ID')['Job Level'].transform('first').eq(df['Job Level'])
df['Counter'] = np.where(m, df['Time'], df.groupby(['ID', 'Job Level']).cumcount())
Difference in changed data:
print (df)
ID Time Job Level
12 2 14 b
13 2 15 b
14 2 16 b
15 2 17 c
16 2 18 c
10 2 12 a
11 2 18 a
12 2 19 b
13 2 20 b
#if need test consecutive duplicates
s = df['Job Level'].ne(df['Job Level'].shift()).cumsum()
m = s.groupby(df['ID']).transform('first').eq(s)
df['Counter1'] = np.where(m, df['Time'], df.groupby(['ID', s]).cumcount())
m = df.groupby('ID')['Job Level'].transform('first').eq(df['Job Level'])
df['Counter2'] = np.where(m, df['Time'], df.groupby(['ID', 'Job Level']).cumcount())
print (df)
ID Time Job Level Counter1 Counter2
12 2 14 b 14 14
13 2 15 b 15 15
14 2 16 b 16 16
15 2 17 c 0 0
16 2 18 c 1 1
10 2 12 a 0 0
11 2 18 a 1 1
12 2 19 b 0 19
13 2 20 b 1 20