I have result, student, version and status columns. in this I want to do group by using Student , Version and result = pass count and result = fail count
similarly like df.groupby(["student", "version", "result=pass"]).size().reset_index(name="new_result")
the below is my data frame
result | student | version | status | Failed Subject |
---|---|---|---|---|
pass | Student-A | L-1.0 | Active | |
fail | Student-A | L-1.0 | Active | Mathematics |
fail | Student-A | L-1.0 | Active | Physics |
pass | Student-A | M-1.0 | Active | |
fail | Student-A | M-1.0 | Active | Science |
pass | Student-B | N-1.0 | Active | |
pass | Student-B | N-1.0 | Active | |
pass | Student-B | N-1.0 | Active | |
pass | Student-C | O-1.0 | Active | |
pass | Student-C | O-1.0 | Active | |
fail | Student-C | O-1.0 | Active | English |
fail | Student-C | P-1.0 | Active | Computers |
fail | Student-C | P-1.0 | Active | Mathematics |
I want my output data frame as below:
student | version | pass_count | fail_count | status | total_count (pass fail) | Failed Subject |
---|---|---|---|---|---|---|
Student-A | L-1.0 | 1 | 2 | Active | 3 | Mathematics,Physics |
Student-A | M-1.0 | 1 | 1 | Active | 2 | Science |
Student-B | N-1.0 | 3 | 0 | Active | 3 | |
Student-C | O-1.0 | 1 | 1 | Active | 2 | English |
Student-C | P-1.0 | 0 | 2 | Active | 2 | Computers,Mathematics |
I am able to get pass and fail count using the below but not total count, anyone pls help
pd.pivot_table(master_df, index=['status', 'student', 'version'], columns=['result'], aggfunc=len, fill_value=0)
CodePudding user response:
You can use .groupby()
agg()
, as follows:
df_out = (df.groupby(['student', 'version', 'status'], as_index=False)
.agg(**{'pass_count': ('result', lambda x: x[x == 'pass'].size),
'fail_count': ('result', lambda x: x[x == 'fail'].size),
'total_count': ('result', 'size'),
'Failed Subject': ('Failed Subject', lambda x: ','.join(x.dropna()))
})
)
Result:
print(df_out)
student version status pass_count fail_count total_count Failed Subject
0 Student-A L-1.0 Active 1 2 3 Mathematics,Physics
1 Student-A M-1.0 Active 1 1 2 Science
2 Student-B N-1.0 Active 3 0 3
3 Student-C O-1.0 Active 2 1 3 English
4 Student-C P-1.0 Active 0 2 2 Computers,Mathematics
CodePudding user response:
@SeaBean which python pandas version you have used ??
CodePudding user response:
How can we achieve this by using python 2.7 and pandas 0.23.0 version, could you pls help