I am attempting to create a dataframe column that assigns a sequential number for each change in a combination of values. Each combination of student and term represents a group. My data looks like this.
student | year |
---|---|
A | 20211 |
A | 20222 |
A | 20222 |
A | 20225 |
B | 20211 |
B | 20211 |
B | 20227 |
C | 20211 |
C | 20222 |
C | 20229 |
And I want to assign values in a new column to indicate each unique student and year combination. I've tried sort_values with groupby and cumcount but I'm getting a sequence of rows not just when the year value changes. This is what I want
student | year | enrollment |
---|---|---|
A | 20211 | 1 |
A | 20222 | 2 |
A | 20222 | 2 |
A | 20225 | 3 |
B | 20211 | 1 |
B | 20211 | 1 |
B | 20227 | 2 |
C | 20211 | 1 |
C | 20222 | 2 |
C | 20229 | 3 |
CodePudding user response:
You can use pd.factorize
per student group:
df['enrollment'] = df.groupby('student')['year'] \
.transform(lambda x: pd.factorize(x)[0] 1)
print(df)
# Output:
student year enrollment
0 A 20211 1
1 A 20222 2
2 A 20222 2
3 A 20225 3
4 B 20211 1
5 B 20211 1
6 B 20227 2
7 C 20211 1
8 C 20222 2
9 C 20229 3