Home > Software design >  Combining dataframes to replace missing values
Combining dataframes to replace missing values

Time:06-28

I am trying to merge two dataframes together to replace possible missing values. For example:

df1:
Fruit     Price
Apple        5
Banana       7
Orange       4

df2: 
Fruit     Price
Pear         3

result:
Fruit     Price
Apple        5
Banana       7
Orange       4
Pear         3

I've used the concat method to do this and it works, however when both dataframes have the same column instead of having both duplicate values I want to use the data from the first df and ignore the second df. For example:

df1:    
Fruit     Price
Apple        5
Banana       7
Orange       4
Pear         6

df2: 
Fruit     Price
Pear         3

result:
Fruit     Price
Apple        5
Banana       7
Orange       4
Pear         6

How can I do this so it works on both examples?

CodePudding user response:

You could use concat and chain drop_duplicates on your Fruit column. Just always make sure to put the dataframe you want the values to keep in case of duplicated as the first parameter in concat:

# Case 1
>>> pd.concat([df1b,df2]).drop_duplicates('Fruit')

    Fruit  Price
0   Apple      5
1  Banana      7
2  Orange      4
0    Pear      3

# Case 2
>>> pd.concat([df1,df2]).drop_duplicates('Fruit')

   Fruit  Price
0   Apple      5
1  Banana      7
2  Orange      4
3    Pear      6

Setup

df1 = pd.DataFrame({'Fruit':['Apple','Banana','Orange','Pear'],
                   'Price':[5,7,4,6]})
df1b = pd.DataFrame({'Fruit':['Apple','Banana','Orange'],
                   'Price':[5,7,4]})
df2 = pd.DataFrame({'Fruit':['Pear'],
                   'Price':[3]})

CodePudding user response:

here is one way to do it: merge df1 and df2 and keep only the ones where there is no match and the concatenate with the first one

When the Fruit already exists, the second will be empty and won't be merged

pd.concat([df1, 
           df2.merge(df1, on='Fruit', how='left', suffixes=('','_y')).query('Price_y.isnull()').drop(columns='Price_y')]
         )
    Fruit   Price
0   Apple   5
1   Banana  7
2   Orange  4
0   Pear    3
  • Related