count ocurrences and creating a column based on the count in Python-CodePudding

a silly question for most of you.

I have this list:

      In      DT_INI      DT_FIM     Status         Description jobName
0  IN100  01/01/2022  01/02/2022  Encerrado  Abend no job XX_01   XX_01
1  IN200  01/02/2022  01/03/2022  Encerrado  Abend no job XX_01   XX_01
2  IN300  01/03/2022  01/04/2022  Encerrado  Abend no job XX_02   XX_02

I need to count how many Ins a jobName has in this list, get this count and populate a new column named Qt_Ins.

it should looks like this:

      In   jobName     DT_INI      DT_FIM      Status      Description Qt_Ins
0     IN100 XX_01 01/01/2022  01/02/2022  Encerrado   Abend no job XX_01      2
1     IN200 XX_01 01/02/2022  01/03/2022  Encerrado   Abend no job XX_01      2
2     IN300 XX_02 01/03/2022  01/04/2022  Encerrado   Abend no job XX_02      1

Could you guys help me again?

Thanks

CodePudding user response：

Input:

import pandas as pd
from io import StringIO

s = """      In      DT_INI      DT_FIM     Status         Description  jobName
0  IN100  01/01/2022  01/02/2022  Encerrado  Abend no job XX_01   XX_01
1  IN200  01/02/2022  01/03/2022  Encerrado  Abend no job XX_01   XX_01
2  IN300  01/03/2022  01/04/2022  Encerrado  Abend no job XX_02   XX_02"""

df = pd.read_table(StringIO(s), "\s\s ", engine="python")

You can use groupby with nunique:

# Store columns
cols = list(df.columns)
df.set_index("jobName", inplace=True)
# Do the group by
df["Qt_Ins"] = df.groupby("jobName")["In"].nunique()
# Re-order columns
df = df.reset_index()[cols   ["Qt_Ins"]]
print(df.to_string())

Output:

      In      DT_INI      DT_FIM     Status         Description jobName  Qt_Ins
0  IN100  01/01/2022  01/02/2022  Encerrado  Abend no job XX_01   XX_01       2
1  IN200  01/02/2022  01/03/2022  Encerrado  Abend no job XX_01   XX_01       2
2  IN300  01/03/2022  01/04/2022  Encerrado  Abend no job XX_02   XX_02       1