I have two dataframes :
import pandas as pd
df1 = pd.DataFrame(columns={'Date', 'latitude' , 'longitude' , 'tg'})
df1.head(5).to_dict() :
{'Date': {0: Timestamp('1984-01-01 00:00:00'), 1: Timestamp('1984-01-02
00:00:00'), 2: Timestamp('1984-01-03 00:00:00'), 3: Timestamp('1984-01-04 00:00:00'), 4: Timestamp('1984-01-05 00:00:00')}, 'longitude': {0: 13.875,
1: 13.875, 2: 13.875, 3: 13.875, 4: 13.875}, 'tg': {0: 5.339999675750732,
1: 2.559999942779541, 2: 4.019999980926514, 3: -0.07000000029802322, 4:
-2.259999990463257}, 'latitude': {0: 49.125, 1: 49.125, 2: 49.125, 3:
49.125, 4: 49.125}}
df2 = pd.DataFrame(columns={'Date', 'latitude_rr' , 'longitude_rr' , 'Runoff' , 'rr'})
df2.head(5).to_dict() :
{'Date': {0: Timestamp('1984-01-01 00:00:00'), 1: Timestamp('1984-01-02
00:00:00'), 2: Timestamp('1984-01-03 00:00:00'), 3: Timestamp('1984-01-04
00:00:00'), 4: Timestamp('1984-01-05 00:00:00')}, 'latitude_rr': {0: 49.125,
1: 49.125, 2: 49.125, 3: 49.125, 4: 49.125}, 'longitude_rr': {0: 13.875, 1:
13.875, 2: 13.875, 3: 13.875, 4: 13.875}, 'Runoff': {0: 0.3808451, 1:
0.3808451, 2: 0.6760563, 3: 1.464789, 4: 2.456338}, 'rr': {0: 0.0, 1: 0.0,
2: 2.799999952316284, 3: 2.200000047683716, 4: 0.4000000059604645}}
I am trying to get the 'tg' values when df1.Date equals df2.Date and df1.latitude equals df2.latitude and df1.longitude equals df2.longitude then add them to df2.
df1 and df2 don't have the same size. I tried :
df3 = df1[df1.latitude.isin(df2.latitude_rr) & df1.longitude.isin(df2.longitude_rr) & df1.Date.isin(df2.Date)].tg
I tried to add that to df2 but since df2 is larger than df1 , it leads to wrong matches.
I am new to Pandas so any help is greatly appreciated. Thank you!
CodePudding user response:
If it's sufficient, use df.merge
:
out = df1.merge(df2.rename(columns={'latitude_rr': 'latitude',
'longitude_rr': 'longitude'}),
on=['Date', 'latitude', 'longitude'])
print(out)
# Output
Date longitude tg latitude Runoff rr
0 1984-01-01 13.875 5.34 49.125 0.380845 0.0
1 1984-01-02 13.875 2.56 49.125 0.380845 0.0
2 1984-01-03 13.875 4.02 49.125 0.676056 2.8
3 1984-01-04 13.875 -0.07 49.125 1.464789 2.2
4 1984-01-05 13.875 -2.26 49.125 2.456338 0.4
If it's not sufficient, you can use KDTree
. This answer can help you.
First merge your 2 dataframes only by date
because it's not a good choice to merge on latitude
and longitude
(float precision). You need a function like np.close
to match coordinates:
out = df1.merge(df2, on='Date')
m1 = np.isclose(out['latitude'], out['latitude_rr'], atol=0.00001)
m2 = np.isclose(out['longitude'], out['longitude_rr'], atol=0.00001)
out = out[m1 & m2]
You have to adjust the tolerance atol