Home > Blockchain >  merging 2 dfs in 1("left join") in pandas based on "contains" values
merging 2 dfs in 1("left join") in pandas based on "contains" values

Time:08-26

I have 2 dfs

df_1

Nº.do Incidente Status  Description Per_Extracao
0   IN6948271   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_SASG_GD9822...    DE : 2022/01/05 ATÉ : 2022/12/08
1   IN6948304   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_AACE_R4539 ...    DE : 2022/01/05 ATÉ : 2022/12/08
2   IN6948307   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_ADAT_SPRK_EX...   DE : 2022/01/05 ATÉ : 2022/12/08
3   IN6948309   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_ADAT_SPRK_EX...   DE : 2022/01/05 ATÉ : 2022/12/08
4   IN6948310   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_ADAT_SPRK_EX...   DE : 2022/01/05 ATÉ : 2022/12/08
5   IN6948311   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_ADAT_SPRK_EX...   DE : 2022/01/05 ATÉ : 2022/12/08

df_2

    JOB_NAME    JOB_STREAM_NAME
0   PP_AACD_NR_D8706_TIHIBRIDA_PROC_EXCUC_D P26_AACD_FAC_TOD
1   PP_SASG_GD9822  P26_AACE_U08
2   PP_AACE_R4539   P26_AACE_U09
3   PP_AACE_R4539_CONS_JUNC P26_AACE_U08
4   PP_AACE_R4539_FMRC_TD_01    P26_AACE_U08
5   PP_AACE_R4539_FMRC_TD_02    P26_AACE_U08

I'm trying to merge then based on the value of JOB_NAME in df_2 founded in df_1(Description columns), it's a kind of a left join on sql, because I need everything from df_2

the output should be something like this

merged_df

Nº.do Incidente Status  Description Per_Extracao JOB_NAME JOB_STREAM_NAME
0   IN6948271   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_SASG_GD9822...    DE : 2022/01/05 ATÉ : 2022/12/08 PP_SASG_GD9822  P26_AACE_U08
1   IN6948304   ENCERRADO   GR26 D.I.T.I. >>> ABEND NO JOB PP_AACE_R4539 ...    DE : 2022/01/05 ATÉ : 2022/12/08 PP_AACE_R4539 P26_AACE_U09
2   NaN   NaN   NaN   NaN PP_AACD_NR_D8706_TIHIBRIDA_PROC_EXCUC_D   P26_AACD_FAC_TOD
3   NaN   NaN   NaN   NaN PP_AACE_R4539_CONS_JUNC P26_AACE_U08
4   NaN   NaN   NaN   NaN PP_AACE_R4539_FMRC_TD_01 P26_AACE_U08 
5   NaN   NaN   NaN   NaN PP_AACE_R4539_FMRC_TD_02 P26_AACE_U08

tried to use some merge commands, but, none has worked, for example:

df_test = df_2.merge(df_2,left_on='JOB_NAME',right_on=df_1['Description'].str.extract(df_2['JOB_NAME']),how='left')

and

df_test = df_2.merge(df_2,left_on='JOB_NAME',right_on=df_1['Description'].str.contains(df_2['JOB_NAME']),how='left')

CodePudding user response:

You can use the same answer as in the other question, just add the keyword right to it on how the merge should be done.

pat = "|".join(df2['JOB_NAME'])
df1.insert(0, 'merge_key', df1['Description'].str.extract("(" pat ")", expand=False))

df = df1.merge(df2, how='right', left_on='merge_key', right_on='JOB_NAME').drop('merge_key', axis=1)
print(df)
       Nº.do Incidente Status                                  Description  jobName                      Per_Extracao                                 JOB_NAME   JOB_STREAM_NAME
0        NaN              NaN                                          NaN      NaN                               NaN  PP_AACD_NR_D8706_TIHIBRIDA_PROC_EXCUC_D  P26_AACD_FAC_TOD
1  ENCERRADO             GR26  D.I.T.I. >>> ABEND NO JOB PP_SASG_GD9822...      NaN  DE : 2022/01/05 ATÉ : 2022/12/08                           PP_SASG_GD9822      P26_AACE_U08
2  ENCERRADO             GR26  D.I.T.I. >>> ABEND NO JOB PP_AACE_R4539 ...      NaN  DE : 2022/01/05 ATÉ : 2022/12/08                            PP_AACE_R4539      P26_AACE_U09
3        NaN              NaN                                          NaN      NaN                               NaN                  PP_AACE_R4539_CONS_JUNC      P26_AACE_U08
4        NaN              NaN                                          NaN      NaN                               NaN                 PP_AACE_R4539_FMRC_TD_01      P26_AACE_U08
5        NaN              NaN                                          NaN      NaN                               NaN                 PP_AACE_R4539_FMRC_TD_02      P26_AACE_U08

  • Related