I have one large dataframe that is missing a few values, and another smaller dataframe with just the IDs of those missing values, and the values that are missing. What is the best way to replace those missing values with the values in the smaller table, for the row where that ID matches? I can accomplish this by iterating over the rows and checking if the ID exists in the other table, but this takes a long time. If for example we have the dataframes below, what would be the best way to do this?
df1 = pd.DataFrame({"A": [1, 2, 3, 4, 5], "B": ["a", np.nan, "c", "d", np.nan]})
df2 = pd.DataFrame({"A": [2, 5], "B": ["b", "e"]})
CodePudding user response:
You could set A
as the index in both dataframes, fillna
, and then reset_index
:
>>> df1.set_index("A").fillna(df2.set_index("A")).reset_index()
A B
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e
CodePudding user response:
# create a dictionary from df2
d=dict(df2[['A','B']].values)
d
# using mask, update there the value is found in dictionary
df1['B']=df1['B'].mask(df1['B'].isna(), df1['A'].map(d))
df1
OR without dictionary
# map, by setting the index on DF2, column A
df1['B']=df1['B'].mask(df1['B'].isna(), df1['A'].map(df2.set_index('A')['B']))
df1
A B
0 1 a
1 2 b
2 3 c
3 4 d
4 5 e