Home > Blockchain >  Pandas row number for group results
Pandas row number for group results

Time:04-22

I am attempting to create a dataframe column that assigns a sequential number for each change in a combination of values. Each combination of student and term represents a group. My data looks like this.

student year
A 20211
A 20222
A 20222
A 20225
B 20211
B 20211
B 20227
C 20211
C 20222
C 20229

And I want to assign values in a new column to indicate each unique student and year combination. I've tried sort_values with groupby and cumcount but I'm getting a sequence of rows not just when the year value changes. This is what I want

student year enrollment
A 20211 1
A 20222 2
A 20222 2
A 20225 3
B 20211 1
B 20211 1
B 20227 2
C 20211 1
C 20222 2
C 20229 3

CodePudding user response:

You can use pd.factorize per student group:

df['enrollment'] = df.groupby('student')['year'] \
                     .transform(lambda x: pd.factorize(x)[0]   1)
print(df)

# Output:
  student   year  enrollment
0       A  20211           1
1       A  20222           2
2       A  20222           2
3       A  20225           3
4       B  20211           1
5       B  20211           1
6       B  20227           2
7       C  20211           1
8       C  20222           2
9       C  20229           3
  • Related