I have a pandas dataframe that looks like this:
City | Lat | Long |
---|---|---|
Moscow | 55.7 | 37.3 |
Paris | 48.8 | 2.2 |
New York | 40.7 | -74.0 |
Moscow | 55.6 | 37.4 |
Paris | 48.9 | 2.3 |
New York | 40.8 | -74.1 |
Moscow | 55.8 | 37.2 |
Paris | 48.6 | 2.0 |
New York | 40.9 | -74.2 |
Let's say these lat/long values are completely inaccurate, so I want to replace them with actual coordinates of these cities.
I have another DataFrame (or a dictionary, if needed) that stores accurate coordinates for each City:
City | Lat | Long |
---|---|---|
Moscow | 55.751244 | 37.395744 |
Paris | 48.858093 | 2.294694 |
New York | 40.712772 | -74.006058 |
What's the easier way to update Lat / Long values for each City in the first table using values from the 2nd table?
FYI, the actual dataset I am working with includes ~ 2 million rows with approximately 500 unique values in the 'City' field.
CodePudding user response:
Set the index of both dataframes to the same column ('City') and update df
with df2
, to override df1's values:
df1 = df1.set_index('City')
df2 = df2.set_index('City')
df1.update(df2)
df1
Lat Long
City
Moscow 55.751244 37.395744
Paris 48.858093 2.294694
New York 40.712772 -74.006058
Moscow 55.751244 37.395744
Paris 48.858093 2.294694
New York 40.712772 -74.006058
Moscow 55.751244 37.395744
Paris 48.858093 2.294694
New York 40.712772 -74.006058
CodePudding user response:
Try:
df_inaccurate.loc[df_inaccurate['City'].isin(df_accurate['City']),['Lat','Long']] = df_accurate[['Lat','Long']]
If you could provide a code of your dataframes instead of tables, I could test it. But this should work.