I have a df as follows:
In DT_INI DT_FIM Status Description
0 IN100 01/01/2022 01/02/2022 Encerrado Abend no job XX_01
1 IN200 01/02/2022 01/03/2022 Encerrado Abend no job XX_01
2 IN300 01/03/2022 01/04/2022 Encerrado Abend no job XX_02
3 IN400 01/04/2022 01/05/2022 Encerrado Abend no job XX_03
4 IN500 01/05/2022 01/06/2022 Encerrado Abend no job XX_03
I'm trying to create a very simple program which based on a list of values, which will be jobName
, in this case XX_01
and XX_02
.
This program will search in the Description
field and create a new list containing only the values founded in the list passed and also create\add a new column with the jobName
.
I was able to create the filtered list:
list_dados = []
for i in jobName:
list_dados.append(dados_df_2.loc[dados_df_2['Description'].str.contains(i)])
pd.concat(list_dados)
But, I couldnt create\add the new column jobName, tried a few thing and none worked.
the output Im looking for is as follows:
In jobName DT_INI DT_FIM Status Description
0 IN100 XX_01 01/01/2022 01/02/2022 Encerrado Abend no job XX_01
1 IN200 XX_01 01/02/2022 01/03/2022 Encerrado Abend no job XX_01
2 IN300 XX_02 01/03/2022 01/04/2022 Encerrado Abend no job XX_02
could you guys help me?
CodePudding user response:
You can try .str.extract
jobName = ['XX_01', 'XX_02']
df['jobName'] = df['Description'].str.extract(r'\b(' '|'.join(jobName) r'\b)')
print(df)
In DT_INI DT_FIM Status Description jobName
0 IN100 01/01/2022 01/02/2022 Encerrado Abend no job XX_01 XX_01
1 IN200 01/02/2022 01/03/2022 Encerrado Abend no job XX_01 XX_01
2 IN300 01/03/2022 01/04/2022 Encerrado Abend no job XX_02 XX_02
3 IN400 01/04/2022 01/05/2022 Encerrado Abend no job XX_03 NaN
4 IN500 01/05/2022 01/06/2022 Encerrado Abend no job XX_03 NaN
To also filtered by jobName
, you can try dropna
by the extracted jobName
column
out = (df.assign(jobName=df['Description'].str.extract(r'(' '|'.join(jobName) r')'))
.dropna(subset='jobName'))
print(out)
In DT_INI DT_FIM Status Description jobName
0 IN100 01/01/2022 01/02/2022 Encerrado Abend no job XX_01 XX_01
1 IN200 01/02/2022 01/03/2022 Encerrado Abend no job XX_01 XX_01
2 IN300 01/03/2022 01/04/2022 Encerrado Abend no job XX_02 XX_02