Home > other >  grouping and counting multiple columns in Pandas
grouping and counting multiple columns in Pandas

Time:09-29

I have the below df and I'm trying to group it by JOB_STREAM_NAME and JOB_NAME and count how many INs happened for this pair

df

    JOB_STREAM_NAME         JOB_NAME            IN          Start_Time              Description                                         
0   P26_NEXT_NBA_DES    PP_NEXT_NBA_AS01A0001_D NaN         NaT                     NaN                                                 
1   P26_NEXT_NBA_TMP    PP_NEXT_NBA_AS01A0001_D NaN         NaT                     NaN                                                 
2   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS01A0001_D IN7395593   2022-08-13 12:38:39     UT20 O.T.A. >>> ABEND IN JOB PP_NEXT_NBA_AS01A...   
3   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS01A0001_D IN7420940   2022-08-19 14:33:32     UT20 O.T.A. >>> ABEND IN JOB PP_NEXT_NBA_AS01A...   
4   P26_AAAR_006_TSA    PP_AAAR_4898_DAVMOV_D   IN7444113   2022-08-25 08:06:10     UT20 O.T.A. >>> ABEND IN JOB PP_AAAR_4898_DAVMOV_D...
5   P26_AAAR_006_TSA    PP_AAAR_4898_DAVMOV_D   IN7395596   2022-08-13 12:39:06     UT20 O.T.A. >>> ABEND IN JOB PP_AAAR_4898_DAVMOV_D...

my desired output should be this:

df_2
    JOB_STREAM_NAME         JOB_NAME            IN          Qt_INs  Start_Time              Description                                         
0   P26_NEXT_NBA_DES    PP_NEXT_NBA_AS01A0001_D NaN         0       NaT                     NaN                                                 
1   P26_NEXT_NBA_TMP    PP_NEXT_NBA_AS01A0001_D NaN         0       NaT                     NaN                                                 
2   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS01A0001_D IN7395593   2       2022-08-13 12:38:39     UT20 O.T.A. >>> ABEND IN JOB PP_NEXT_NBA_AS01A...   
3   P26_NEXT_NBA_TOD    PP_NEXT_NBA_AS01A0001_D IN7420940   2       2022-08-19 14:33:32     UT20 O.T.A. >>> ABEND IN JOB PP_NEXT_NBA_AS01A...   
4   P26_AAAR_006_TSA    PP_AAAR_4898_DAVMOV_D   IN7444113   2       2022-08-25 08:06:10     UT20 O.T.A. >>> ABEND IN JOB PP_AAAR_4898_DAVMOV_D...
5   P26_AAAR_006_TSA    PP_AAAR_4898_DAVMOV_D   IN7395596   2       2022-08-13 12:39:06     UT20 O.T.A. >>> ABEND IN JOB PP_AAAR_4898_DAVMOV_D...

I tried somethings like

df["Qt_INs"] =  df.groupby(["JOB_STREAM_NAME","JOB_NAME"]).count()
df["Qt_INs"] =  df.groupby(["JOB_STREAM_NAME","JOB_NAME"])["IN"].nunique()

None worked ans intended

could you guys help me?

CodePudding user response:

You need to use transform if your output is supposed to look as in the above:

df["Qt_INs"] =  df.groupby(["JOB_STREAM_NAME","JOB_NAME"])["IN"].transform("count")
  • Related