a silly question for most of you.
I have this list:
In DT_INI DT_FIM Status Description jobName
0 IN100 01/01/2022 01/02/2022 Encerrado Abend no job XX_01 XX_01
1 IN200 01/02/2022 01/03/2022 Encerrado Abend no job XX_01 XX_01
2 IN300 01/03/2022 01/04/2022 Encerrado Abend no job XX_02 XX_02
I need to count how many Ins a jobName has in this list, get this count and populate a new column named Qt_Ins
.
it should looks like this:
In jobName DT_INI DT_FIM Status Description Qt_Ins
0 IN100 XX_01 01/01/2022 01/02/2022 Encerrado Abend no job XX_01 2
1 IN200 XX_01 01/02/2022 01/03/2022 Encerrado Abend no job XX_01 2
2 IN300 XX_02 01/03/2022 01/04/2022 Encerrado Abend no job XX_02 1
Could you guys help me again?
Thanks
CodePudding user response:
Input:
import pandas as pd
from io import StringIO
s = """ In DT_INI DT_FIM Status Description jobName
0 IN100 01/01/2022 01/02/2022 Encerrado Abend no job XX_01 XX_01
1 IN200 01/02/2022 01/03/2022 Encerrado Abend no job XX_01 XX_01
2 IN300 01/03/2022 01/04/2022 Encerrado Abend no job XX_02 XX_02"""
df = pd.read_table(StringIO(s), "\s\s ", engine="python")
You can use groupby
with nunique
:
# Store columns
cols = list(df.columns)
df.set_index("jobName", inplace=True)
# Do the group by
df["Qt_Ins"] = df.groupby("jobName")["In"].nunique()
# Re-order columns
df = df.reset_index()[cols ["Qt_Ins"]]
print(df.to_string())
Output:
In DT_INI DT_FIM Status Description jobName Qt_Ins
0 IN100 01/01/2022 01/02/2022 Encerrado Abend no job XX_01 XX_01 2
1 IN200 01/02/2022 01/03/2022 Encerrado Abend no job XX_01 XX_01 2
2 IN300 01/03/2022 01/04/2022 Encerrado Abend no job XX_02 XX_02 1