Following is my dataset:
class | StudentName | Age | Height | Weight | PassedExam |
---|---|---|---|---|---|
5 | Joe | 10 | 150 | 56 | Pass |
5 | Nick | 10 | 153 | 54 | Fail |
5 | Adam | 10 | 156 | 61 | Pass |
4 | Julia | 9 | 148 | 50 | Pass |
7 | George | 12 | 156 | 59 | Fail |
5 | Robert | 10 | 157 | 60 | Pass |
6 | Sam | 11 | 155 | 60 | Pass |
6 | travis | 11 | 153 | 63 | Pass |
I want select Students if any of the below conditions apply:
Height greater than 150
Passed Exam is equal to 'Pass'
And further create two columns valid and Invalid in output as follows:
class | ValidStudent | InvalidStudent | Weight |
---|---|---|---|
5 | Adam,Robert | Joe,Nick | 51,60,56,54 |
6 | Sam,travis | 60,63 | |
4 | Julia | 50 | |
7 | George | 59 |
Following is my code to select rows with filter values but I am not able to create the columns:
df.loc[(df['PassedExam'] == 'Pass') & (df['Height'] > 150)]
CodePudding user response:
You can use:
df_valid = df[(df['PassedExam'] == 'Pass') & (df['Height'] > 150)]
df_valid = df_valid.groupby('class').agg(ValidStudent = pd.NamedAgg(column="StudentName", aggfunc=",".join)).reset_index()
df_invalid = df[(df['PassedExam'] != 'Pass') | (df['Height'] <= 150)]
df_invalid = df_invalid.groupby('class').agg(InvalidStudent = pd.NamedAgg(column="StudentName", aggfunc=",".join)).reset_index()
df_valid = df_valid.merge(df_invalid, how='outer', on='class')
df_valid = df_valid.merge(df.groupby('class').agg(Weight = pd.NamedAgg(column="Weight", aggfunc=",".join)),
how='left', on=['class'])
OUTPUT
class ValidStudent InvalidStudent Weight
0 5 Adam,Robert Joe,Nick 56,54,61,60
1 6 Sam,travis NaN 60,63
2 4 NaN Julia 50
3 7 NaN George 59