Home > other >  Adding columns and its values to a list in Pandas based on a filter
Adding columns and its values to a list in Pandas based on a filter

Time:08-18

I have a df as follows:

    In  DT_INI  DT_FIM  Status  Description
0   IN100   01/01/2022  01/02/2022  Encerrado   Abend no job XX_01
1   IN200   01/02/2022  01/03/2022  Encerrado   Abend no job XX_01
2   IN300   01/03/2022  01/04/2022  Encerrado   Abend no job XX_02
3   IN400   01/04/2022  01/05/2022  Encerrado   Abend no job XX_03
4   IN500   01/05/2022  01/06/2022  Encerrado   Abend no job XX_03

I'm trying to create a very simple program which based on a list of values, which will be jobName, in this case XX_01 and XX_02. This program will search in the Description field and create a new list containing only the values founded in the list passed and also create\add a new column with the jobName.

I was able to create the filtered list:

list_dados = []
for i in jobName:
    list_dados.append(dados_df_2.loc[dados_df_2['Description'].str.contains(i)])

pd.concat(list_dados)

But, I couldnt create\add the new column jobName, tried a few thing and none worked.

the output Im looking for is as follows:

    In  jobName DT_INI  DT_FIM  Status  Description
0   IN100   XX_01   01/01/2022  01/02/2022  Encerrado   Abend no job XX_01
1   IN200   XX_01   01/02/2022  01/03/2022  Encerrado   Abend no job XX_01
2   IN300   XX_02   01/03/2022  01/04/2022  Encerrado   Abend no job XX_02

could you guys help me?

CodePudding user response:

You can try .str.extract

jobName = ['XX_01', 'XX_02']

df['jobName'] = df['Description'].str.extract(r'\b('   '|'.join(jobName)   r'\b)')
print(df)

      In      DT_INI      DT_FIM     Status         Description jobName
0  IN100  01/01/2022  01/02/2022  Encerrado  Abend no job XX_01   XX_01
1  IN200  01/02/2022  01/03/2022  Encerrado  Abend no job XX_01   XX_01
2  IN300  01/03/2022  01/04/2022  Encerrado  Abend no job XX_02   XX_02
3  IN400  01/04/2022  01/05/2022  Encerrado  Abend no job XX_03     NaN
4  IN500  01/05/2022  01/06/2022  Encerrado  Abend no job XX_03     NaN

To also filtered by jobName, you can try dropna by the extracted jobName column

out = (df.assign(jobName=df['Description'].str.extract(r'('   '|'.join(jobName)   r')'))
       .dropna(subset='jobName'))
print(out)

      In      DT_INI      DT_FIM     Status         Description jobName
0  IN100  01/01/2022  01/02/2022  Encerrado  Abend no job XX_01   XX_01
1  IN200  01/02/2022  01/03/2022  Encerrado  Abend no job XX_01   XX_01
2  IN300  01/03/2022  01/04/2022  Encerrado  Abend no job XX_02   XX_02
  • Related