How to combine two dataframes to override df_1 empty values with df_2 values (IF exist) AND keep all-CodePudding

I'm trying to merge 2 dfs in a particular way.

Input:

import pandas as pd
import numpy as np

df = pd.DataFrame([('US', 'apples', 10), ('US', 'oranges', np.nan), ('US', 'grapes', np.nan)], columns = ['country', 'product', 'price'])
df_2 = pd.DataFrame([('US', 'bananas', 10), ('US', 'oranges', 10), ('US', 'oranges', 15)], columns = ['country', 'product', 'price'])

df
  country  product  price
0      US   apples   10.0
1      US  oranges    NaN  # todo - here this line is w/o price
2      US   grapes    NaN
df_2
  country  product  price
0      US  bananas     10
1      US  oranges     10  # todo - here we have price_1
2      US  oranges     15  # todo - and price_2

What I'm looking for: I want to get dataframe where df and df_2 are combined BUT df_1 empty line for oranges replaced by 2 lines from df_2.

  country  product  price
0      US   apples   10.0
2      US   grapes    NaN
0      US  bananas   10.0
1      US  oranges   10.0
2      US  oranges   15.0

What I've tried: If I concat them, then I still have oranges with 0 instead of having only oranges from df_2

df_3 = pd.concat([df, df_2])
df_3
  country  product  price
0      US   apples   10.0
1      US  oranges    NaN  # todo - I want to omit this line
2      US   grapes    NaN
0      US  bananas   10.0
1      US  oranges   10.0
2      US  oranges   15.0

Is there're an easy way to do it? Appreciate any help!

CodePudding user response：

After concat, looks like you want to drop the NaN price rows if there exist non-NaN rows for the same ['country', 'product'].

We can use groupby() and count to find the ['country', 'product'] groups with non-NaN prices. Then, we apply dropna() on these groups.

df_3 = pd.concat([df, df_2])
groups = df_3.groupby(['country', 'product'])['price']
group_counts = groups.transform('count') 
pd.concat([df_3[group_counts > 0].dropna(), df_3[group_counts == 0]])

Note that count does not count NaN's, so if a ['country', 'product'] group only has NaN prices, its count will be 0. Thus, df_3[group_counts > 0] only selects groups that have non-NaN rows. Finally, we concat with the groups that only have NaN rows.

If you might have multiple NaN rows for the same ['country', 'product'] and want to combine them, change the last line to:

pd.concat([df_3[group_counts > 0].dropna(), df_3[group_counts == 0].drop_duplicates()])