I have the below df and I'm trying to group it by JOB_STREAM_NAME and JOB_NAME and count how many INs happened for this pair
df
JOB_STREAM_NAME JOB_NAME IN Start_Time Description
0 P26_NEXT_NBA_DES PP_NEXT_NBA_AS01A0001_D NaN NaT NaN
1 P26_NEXT_NBA_TMP PP_NEXT_NBA_AS01A0001_D NaN NaT NaN
2 P26_NEXT_NBA_TOD PP_NEXT_NBA_AS01A0001_D IN7395593 2022-08-13 12:38:39 UT20 O.T.A. >>> ABEND IN JOB PP_NEXT_NBA_AS01A...
3 P26_NEXT_NBA_TOD PP_NEXT_NBA_AS01A0001_D IN7420940 2022-08-19 14:33:32 UT20 O.T.A. >>> ABEND IN JOB PP_NEXT_NBA_AS01A...
4 P26_AAAR_006_TSA PP_AAAR_4898_DAVMOV_D IN7444113 2022-08-25 08:06:10 UT20 O.T.A. >>> ABEND IN JOB PP_AAAR_4898_DAVMOV_D...
5 P26_AAAR_006_TSA PP_AAAR_4898_DAVMOV_D IN7395596 2022-08-13 12:39:06 UT20 O.T.A. >>> ABEND IN JOB PP_AAAR_4898_DAVMOV_D...
my desired output should be this:
df_2
JOB_STREAM_NAME JOB_NAME IN Qt_INs Start_Time Description
0 P26_NEXT_NBA_DES PP_NEXT_NBA_AS01A0001_D NaN 0 NaT NaN
1 P26_NEXT_NBA_TMP PP_NEXT_NBA_AS01A0001_D NaN 0 NaT NaN
2 P26_NEXT_NBA_TOD PP_NEXT_NBA_AS01A0001_D IN7395593 2 2022-08-13 12:38:39 UT20 O.T.A. >>> ABEND IN JOB PP_NEXT_NBA_AS01A...
3 P26_NEXT_NBA_TOD PP_NEXT_NBA_AS01A0001_D IN7420940 2 2022-08-19 14:33:32 UT20 O.T.A. >>> ABEND IN JOB PP_NEXT_NBA_AS01A...
4 P26_AAAR_006_TSA PP_AAAR_4898_DAVMOV_D IN7444113 2 2022-08-25 08:06:10 UT20 O.T.A. >>> ABEND IN JOB PP_AAAR_4898_DAVMOV_D...
5 P26_AAAR_006_TSA PP_AAAR_4898_DAVMOV_D IN7395596 2 2022-08-13 12:39:06 UT20 O.T.A. >>> ABEND IN JOB PP_AAAR_4898_DAVMOV_D...
I tried somethings like
df["Qt_INs"] = df.groupby(["JOB_STREAM_NAME","JOB_NAME"]).count()
df["Qt_INs"] = df.groupby(["JOB_STREAM_NAME","JOB_NAME"])["IN"].nunique()
None worked ans intended
could you guys help me?
CodePudding user response:
You need to use transform if your output is supposed to look as in the above:
df["Qt_INs"] = df.groupby(["JOB_STREAM_NAME","JOB_NAME"])["IN"].transform("count")