Home > OS >  pandas group by on column values and extract one column text
pandas group by on column values and extract one column text

Time:09-26

I have result, student, version and status columns. in this I want to do group by using Student , Version and result = pass count and result = fail count

similarly like df.groupby(["student", "version", "result=pass"]).size().reset_index(name="new_result")

the below is my data frame

result student version status Failed Subject
pass Student-A L-1.0 Active
fail Student-A L-1.0 Active Mathematics
fail Student-A L-1.0 Active Physics
pass Student-A M-1.0 Active
fail Student-A M-1.0 Active Science
pass Student-B N-1.0 Active
pass Student-B N-1.0 Active
pass Student-B N-1.0 Active
pass Student-C O-1.0 Active
pass Student-C O-1.0 Active
fail Student-C O-1.0 Active English
fail Student-C P-1.0 Active Computers
fail Student-C P-1.0 Active Mathematics

I want my output data frame as below:

student version pass_count fail_count status total_count (pass fail) Failed Subject
Student-A L-1.0 1 2 Active 3 Mathematics,Physics
Student-A M-1.0 1 1 Active 2 Science
Student-B N-1.0 3 0 Active 3
Student-C O-1.0 1 1 Active 2 English
Student-C P-1.0 0 2 Active 2 Computers,Mathematics

I am able to get pass and fail count using the below but not total count, anyone pls help

pd.pivot_table(master_df, index=['status', 'student', 'version'], columns=['result'], aggfunc=len, fill_value=0)

CodePudding user response:

You can use .groupby() agg(), as follows:

df_out = (df.groupby(['student', 'version', 'status'], as_index=False)
            .agg(**{'pass_count': ('result', lambda x: x[x == 'pass'].size),
                    'fail_count': ('result', lambda x: x[x == 'fail'].size),                    
                    'total_count': ('result', 'size'),
                    'Failed Subject': ('Failed Subject', lambda x: ','.join(x.dropna()))
                    })
          )

Result:

print(df_out)

     student version  status  pass_count  fail_count  total_count         Failed Subject
0  Student-A   L-1.0  Active           1           2            3    Mathematics,Physics
1  Student-A   M-1.0  Active           1           1            2                Science
2  Student-B   N-1.0  Active           3           0            3                       
3  Student-C   O-1.0  Active           2           1            3                English
4  Student-C   P-1.0  Active           0           2            2  Computers,Mathematics

CodePudding user response:

@SeaBean which python pandas version you have used ??

CodePudding user response:

How can we achieve this by using python 2.7 and pandas 0.23.0 version, could you pls help

  • Related