Assume I have the following raw dataframe composes of some students information:
STUDENT_ID SEMESTER COURSE_NAME
50694 21 X1
50694 21 X2
50285 21 X3
51442 21 X4
51600 21 X5
50285 21 X6
50494 21 X7
50494 21 X8
50802 21 X9
50802 21 X10
50802 21 X11
50285 21 X12
I want to (1) group them by the STUDENT_ID
so that each student's courses are group together, to look like this:
STUDENT_ID SEMESTER COURSE_NAME
50694 21 X1
50694 21 X2
50494 21 X7
50494 21 X8
50285 21 X3
50285 21 X6
50285 21 X12
50802 21 X9
50802 21 X10
50802 21 X11
51442 21 X4
51600 21 X5
And (2) split them by a specifc value in the STUDENT_ID
column, say 50802
to get:
STUDENT_ID SEMESTER COURSE_NAME
50694 21 X1
50694 21 X2
50494 21 X7
50494 21 X8
50285 21 X3
50285 21 X6
50285 21 X12
and
STUDENT_ID SEMESTER COURSE_NAME
50802 21 X9
50802 21 X10
50802 21 X11
51442 21 X4
51600 21 X5
Any help is highly appreciated.
CodePudding user response:
Use df.sort_values
to sort the dataframe, and then use np.split
to split the dataframe at the specified index(es):
df = df.sort_values('STUDENT_ID').reset_index(drop=True)
splits = np.split(df, [df['STUDENT_ID'].eq(50802).idxmax()])
Output:
>>> splits
[ STUDENT_ID SEMESTER COURSE_NAME
0 50285 21 X3
1 50285 21 X6
2 50285 21 X12
3 50494 21 X7
4 50494 21 X8
5 50694 21 X1
6 50694 21 X2,
STUDENT_ID SEMESTER COURSE_NAME
7 50802 21 X9
8 50802 21 X10
9 50802 21 X11
10 51442 21 X4
11 51600 21 X5]
>>> splits[0]
STUDENT_ID SEMESTER COURSE_NAME
0 50285 21 X3
1 50285 21 X6
2 50285 21 X12
3 50494 21 X7
4 50494 21 X8
5 50694 21 X1
6 50694 21 X2
>>> splits[1]
STUDENT_ID SEMESTER COURSE_NAME
7 50802 21 X9
8 50802 21 X10
9 50802 21 X11
10 51442 21 X4
11 51600 21 X5
CodePudding user response:
The first step is a simple sort_values
. For the second one, the condition is unclear but you could use groupby
:
dfs = [g for _,g in
(df
.sort_values(by='STUDENT_ID')
.groupby(df['STUDENT_ID'].lt(50802))
)]
Output:
[ STUDENT_ID SEMESTER COURSE_NAME
8 50802 21 X9
9 50802 21 X10
10 50802 21 X11
3 51442 21 X4
4 51600 21 X5,
STUDENT_ID SEMESTER COURSE_NAME
2 50285 21 X3
5 50285 21 X6
11 50285 21 X12
6 50494 21 X7
7 50494 21 X8
0 50694 21 X1
1 50694 21 X2]