Home > Net >  Merge two dataframes of different lengths with matching ID and fill NaN values of main dataframe on
Merge two dataframes of different lengths with matching ID and fill NaN values of main dataframe on

Time:01-01

I have two dataframes, the main dataframe has two columns for Lat and Long some of which have values and some of which are NaN. I have another dataframe that is a subset of this main dataframe with Lat and Long filled in with values. I'd like to fill in the main DataFrame with these values based on matching ID.

Main DataFrame:

        ID      Lat         Long
0   9547507704  33.853682   -80.369867
1   9777677704  32.942332   -80.066165
2   5791407702  47.636067   -122.302559
3   6223567700  34.224719   -117.372550
4   9662437702  42.521828   -82.913680
... ... ... ...
968552  4395967002  NaN NaN
968553  6985647108  NaN NaN
968554  7996438405  NaN NaN
968555  9054647103  NaN NaN
968556  9184687004  NaN NaN

DataFrame to fill:

    ID          Lat         Long
0   2392497107  36.824257   -76.272486
1   2649457102  37.633918   -77.507746
2   2952437110  37.511077   -77.528711
3   3379937304  39.119430   -77.569008
4   3773127208  36.909731   -76.070420
... ... ... ...
23263   9512327001  37.371059   -79.194838
23264   9677417002  38.406665   -78.913133
23265   9715167306  38.761194   -77.454184
23266   9767568404  37.022287   -76.319882
23267   9872047407  38.823017   -77.057818

The two dataframes are of different lengths.

EDIT for clarification: I need to replace the NaN in the Lat & Long columns of the main DataFrame with the Lat & Long from the subset if ID matches in both DataFrames. My DataFrames are both >60 columns, I am only trying to replace the NaN for those two columns.

Edit:

I went with this mapping solution although it isn't exactly what I'm looking for, I know there is a much more simple solution.

#mapping coordinates to NaN values in main 
m = dict(zip(fill_df.ID,fill_df.Lat))
main_df.Lat = main_df.Lat.fillna(main_df.ID.map(m))
n = dict(zip(fill_df.ID,fill_df.Long))
main_df.Long = main_df.Long.fillna(main_df.ID.map(n))

CodePudding user response:

new_df = pd.merge(main_df, sub_df, how='left', on='ID')

I guess the left join will do the job.

CodePudding user response:

One approach is to use DataFrame.combine_first. This method aligns DataFrames on index and columns, so you need to set ID as the index of each DataFrame, call df_main.combine_first(df_filler), then reset ID back into a column. (Seems awkward; there's probably a more elegant approach.)

Assuming your main DataFrame is named df_main and your DataFrame to fill is named df_filler:

df_main.set_index('ID').combine_first(df_filler.set_index('ID')).reset_index()

CodePudding user response:

This should do the trick:

import math
A = pd.DataFrame({'ID' : [1, 2, 3], 'Lat':[4, 5, 6], 'Long': [7, 8, float('nan')]})
B = pd.DataFrame({'ID' : [2, 3], 'Lat':[5, 6], 'Long': [8, 9]})
print('Old table:')
print(A)
print('Fix table:')
print(B)
for i in A.index.to_list():
    for j in B.index.to_list():
        if not A['ID'][i] == B['ID'][j]:
            continue
        if math.isnan(A['Lat'][i]):
            A.at[i, 'Lat'] = B['Lat'][j] 
        if math.isnan(A['Long'][i]):
            A.at[i, 'Long'] = B['Long'][j]
print('New table:')
print(A)

Returns:

   ID  Lat  Long
0   1    4   7.0
1   2    5   8.0
2   3    6   NaN
Fix table:
   ID  Lat  Long
0   2    5     8
1   3    6     9
New table:
   ID  Lat  Long
0   1    4   7.0
1   2    5   8.0
2   3    6   9.0

Not very elegant but gets the job done :)

CodePudding user response:

A.set_index('ID').fillna(B[['ID', 'Lat', 'Long']].set_index('ID')).reset_index()

Example:

>>> A
   ID  Lat  Long  Dont Update
0   1    4   7.0          7.0
1   2    5   8.0          8.0
2   3    6   NaN          NaN
>>> B
   ID  Lat  Long  Dont Update
0   2    5     8           10
1   3    6     9           10
>>> A.set_index('ID').fillna(B[['ID', 'Lat', 'Long']].set_index('ID')).reset_index()
   ID  Lat  Long  Dont Update
0   1    4   7.0          7.0
1   2    5   8.0          8.0
2   3    6   9.0          NaN
  • Related