Home > Enterprise >  Extracting column from dataframe based on matching three column values in another dataframe
Extracting column from dataframe based on matching three column values in another dataframe

Time:02-22

I have two dataframes :

import pandas as pd

df1 = pd.DataFrame(columns={'Date', 'latitude' , 'longitude' , 'tg'})

df1.head(5).to_dict() : 

{'Date': {0: Timestamp('1984-01-01 00:00:00'), 1: Timestamp('1984-01-02 
00:00:00'), 2: Timestamp('1984-01-03 00:00:00'), 3: Timestamp('1984-01-04 00:00:00'), 4: Timestamp('1984-01-05 00:00:00')}, 'longitude': {0: 13.875,
 1: 13.875, 2: 13.875, 3: 13.875, 4: 13.875}, 'tg': {0: 5.339999675750732, 
1: 2.559999942779541, 2: 4.019999980926514, 3: -0.07000000029802322, 4: 
-2.259999990463257}, 'latitude': {0: 49.125, 1: 49.125, 2: 49.125, 3: 
49.125, 4: 49.125}}



df2 = pd.DataFrame(columns={'Date', 'latitude_rr' , 'longitude_rr' , 'Runoff' , 'rr'})

df2.head(5).to_dict() : 

{'Date': {0: Timestamp('1984-01-01 00:00:00'), 1: Timestamp('1984-01-02 
00:00:00'), 2: Timestamp('1984-01-03 00:00:00'), 3: Timestamp('1984-01-04 
00:00:00'), 4: Timestamp('1984-01-05 00:00:00')}, 'latitude_rr': {0: 49.125, 
1: 49.125, 2: 49.125, 3: 49.125, 4: 49.125}, 'longitude_rr': {0: 13.875, 1: 
13.875, 2: 13.875, 3: 13.875, 4: 13.875}, 'Runoff': {0: 0.3808451, 1: 
0.3808451, 2: 0.6760563, 3: 1.464789, 4: 2.456338}, 'rr': {0: 0.0, 1: 0.0, 
2: 2.799999952316284, 3: 2.200000047683716, 4: 0.4000000059604645}}

I am trying to get the 'tg' values when df1.Date equals df2.Date and df1.latitude equals df2.latitude and df1.longitude equals df2.longitude then add them to df2.

df1 and df2 don't have the same size. I tried :

df3 = df1[df1.latitude.isin(df2.latitude_rr) & df1.longitude.isin(df2.longitude_rr) & df1.Date.isin(df2.Date)].tg

I tried to add that to df2 but since df2 is larger than df1 , it leads to wrong matches.

I am new to Pandas so any help is greatly appreciated. Thank you!

CodePudding user response:

If it's sufficient, use df.merge:

out = df1.merge(df2.rename(columns={'latitude_rr': 'latitude', 
                                    'longitude_rr': 'longitude'}),
                on=['Date', 'latitude', 'longitude'])
print(out)

# Output
        Date  longitude    tg  latitude    Runoff   rr
0 1984-01-01     13.875  5.34    49.125  0.380845  0.0
1 1984-01-02     13.875  2.56    49.125  0.380845  0.0
2 1984-01-03     13.875  4.02    49.125  0.676056  2.8
3 1984-01-04     13.875 -0.07    49.125  1.464789  2.2
4 1984-01-05     13.875 -2.26    49.125  2.456338  0.4

If it's not sufficient, you can use KDTree. This answer can help you.


First merge your 2 dataframes only by date because it's not a good choice to merge on latitude and longitude (float precision). You need a function like np.close to match coordinates:

out = df1.merge(df2, on='Date')

m1 = np.isclose(out['latitude'], out['latitude_rr'], atol=0.00001)
m2 = np.isclose(out['longitude'], out['longitude_rr'], atol=0.00001)
out = out[m1 & m2]

You have to adjust the tolerance atol

  • Related