Home > Enterprise >  Comparing Two dataframes of pandas on the basis of condition of two columns
Comparing Two dataframes of pandas on the basis of condition of two columns

Time:05-05

I have two dataframe in which columns are different, And I need to search and the map the data in new file. I am sharing the dataframes and also desired output

DF1

    Ref.X               Ref.Y            Ref.No
    255623              145244651        XF1234
    45454544            NA               DF7886
    1300256655          3511614646       DF8948
    5546446166          NA               VX4578
    1230001556          NA               GL8947
    4548754545          4548545877       DF7896
    3652656             NA               XF4895
    7884045455          45681247         GL8597
    148428              1242154661       XF3564

DF2

    Type      STR1            STR2
    ABJH      45656644        145244651
    ABJH      844654989       1242154661
    BHJH      51654241654     3511614646
    BHJH      1230001556      255565656
    ABJH      5546446166      8946549849
    ABJH      45454544        544895655
    BHJH      3652656         565464447

Now I need to a output from above two dataframes.

Conditions

  1. Need to search Ref.Y in STR2, if available then pick the "Type" for output.
  2. if Ref.Y is "NA" then need to search Ref.X in STR1, if available then pick the "Type" for output.
  3. Also I need "Ref.No" in desired output correspond to the "Type" Ref.X or Ref.Y found in DF2

Desired Output

    Ref.X               Ref.Y            Ref.No     Int_Type
    255623              145244651        XF1234     ABJH
    45454544            NA               DF7886     ABJH
    1300256655          3511614646       DF8948     BHJH
    5546446166          NA               VX4578     ABJH
    1230001556          NA               GL8947     BHJH
    3652656             NA               XF4895     BHJH
    148428              1242154661       XF3564     ABJH

Thanks.

CodePudding user response:

You can try repace NA string to <NA> type then use np.where

df1 = df1.replace({'NA': pd.NA})

df1['Int_Type'] = np.where(df1['Ref.Y'].isna(),
                       df1['Ref.X'].map(dict(zip(df2['STR1'], df2['Type']))),
                       df1['Ref.Y'].map(dict(zip(df2['STR2'], df2['Type']))))

Or you can directly compare with NA string:

df1['Int_Type'] = np.where(df1['Ref.Y'].eq('NA'),
        Ref.X         Ref.Y  Ref.No  Int_Type
0      255623  1.452447e 08  XF1234  ABJH
1      148428  1.242155e 09  XF3564  ABJH
2  1300256655  3.511615e 09  DF8948  BHJH
3  1230001556           NaN  GL8947  BHJH
4  5546446166           NaN  VX4578  ABJH
5    45454544           NaN  DF7886  ABJH
6     3652656           NaN  XF4895  BHJH
7  4548754545  4.548546e 09  DF7896   NaN
8  7884045455  4.568125e 07  GL8597   NaN

To get Ref.No correspond to the "Type" Ref.X or Ref.Y found in DF2, you can drop the NAN column in Int_Type

df1 = df1.dropna(subset='Int_Type')
print(df1)

        Ref.X         Ref.Y  Ref.No Int_Type
0      255623  1.452447e 08  XF1234     ABJH
1      148428  1.242155e 09  XF3564     ABJH
2  1300256655  3.511615e 09  DF8948     BHJH
3  1230001556           NaN  GL8947     BHJH
4  5546446166           NaN  VX4578     ABJH
5    45454544           NaN  DF7886     ABJH
6     3652656           NaN  XF4895     BHJH
  • Related