Home > Software design >  Join two dataframes tables applying condition on similar values of multiple column
Join two dataframes tables applying condition on similar values of multiple column

Time:11-20

I have the following two dataframes with different size.

df1: (with 40,000 data)
Date Latitude Longitude Car_Accident
2/10/22 43.69  -79.4     40
2/10/22 43.69  -79.4    34
2/20/22 43.68  -79.0     30
2/20/22 43.50  -79.1     35
............................
..............................

df2: (with 20,000 data)
Date Latitude Longitude Route
2/20/22 43.68  -79.0     111
2/20/22 43.50  -79.1    123
2/10/22 43.69  -79.4     124
2/10/22 43.69  -79.4     124
............................
.............................

I want to merge them applying condition on three different columns.

if the value of df1['Date']==df2['Date'] & df1[Latitude]==df2[Latitude] & df1[Longitude]==df2[Longitude]

then we will add the column with having data who satisfied this condition

Example

final df:

Date   Latitude Longitude  Car_Accident  Route

2/20/22 43.50    -79.1            35      123
2/20/22 43.68    -79.0            30       111
2/10/22 43.69    -79.4            40       124
2/10/22 43.69    -79.4            34       124

Method I tried:

I tried inner join as well as indexing. Sometimes I had memory error, sometimes it gives null table or left table if I use left join.pandas.merge(df1, df2, on=[Date,Lat,Long]) or dataframe.mergeanything is not giving proper answer.

Problem I faced:

In inner join cartesian product is consuming huge computation process and also I don't is this the right method.

CodePudding user response:

Using pandas.DataFrame.drop_duplicates right after the left merge should give you the result you're looking for:

final_df= (
            df1.merge(df2, on=["Date", "Latitude", "Longitude"], how="left")
               .drop_duplicates(ignore_index=True)
           )

# Output :

print(final_df)

      Date  Latitude  Longitude  Car_Accident  Route
0  2/10/22     43.69      -79.4            40    124
1  2/10/22     43.69      -79.4            34    124
2  2/20/22     43.68      -79.0            30    111
3  2/20/22     43.50      -79.1            35    123
  • Related