Home > other >  Python: Merge two dataframes skipping some rows w.r.t. to column value
Python: Merge two dataframes skipping some rows w.r.t. to column value

Time:02-18

Have got dataframe df1 and df2 as below:

df1

Item        Space
Grape       0.125
Mango       0.125
Mango       0.125
Beetroot    0.125
Beetroot    0.125
Beetroot    0.125
Carrot      0.125
Carrot      0.125
Carrot      0.125
Carrot      0.125

df2

Table_No    Part    Dedicated_Item      
1           1       null                    
1           2       null                
1           3       null                
1           5       Potato      
1           6       null                
1           7       null                
1           8       null                
2           1       null                
2           2       Onion
2           3       null                
2           5       null                
2           6       null                
2           7       null        
2           8       null

Need to merge df1 and df2 only to rows with 'null' value in 'Dedicated_Item' column in df2. Note: Both df1 and df2 row order should not be changed. To be added as such. If df2 got extra rows left as such.

Expected Output:

Table_No    Part    Dedicated_Item      Item        Space
1           1       null                Grape       0.125       
1           2       null                Mango       0.125   
1           3       null                Mango       0.125
1           5       Potato      
1           6       null                Beetroot    0.125
1           7       null                Beetroot    0.125
1           8       null                Beetroot    0.125
2           1       null                Carrot      0.125   
2           2       Onion
2           3       null                Carrot      0.125
2           5       null                Carrot      0.125
2           6       null                Carrot      0.125
2           7       null        
2           8       null

Thanks in Advance!

CodePudding user response:

Assuming null are NaNs, you could use:

idx = df2[df2['Dedicated_Item'].isna()].index
df2.loc[idx, df1.columns] = df1.set_axis(idx[:len(df1)])

# or for a new dataframe:
# new = pd.concat([df2, df1.set_axis(idx[:len(df1)])], axis=1)

If you have literal null strings, use:

idx = df2[df2['Dedicated_Item'].eq('null')].index

output:

    Table_No  Part Dedicated_Item      Item  Space
0          1     1            NaN     Grape  0.125
1          1     2            NaN     Mango  0.125
2          1     3            NaN     Mango  0.125
3          1     5         Potato       NaN    NaN
4          1     6            NaN  Beetroot  0.125
5          1     7            NaN  Beetroot  0.125
6          1     8            NaN  Beetroot  0.125
7          2     1            NaN    Carrot  0.125
8          2     2          Onion       NaN    NaN
9          2     3            NaN    Carrot  0.125
10         2     5            NaN    Carrot  0.125
11         2     6            NaN    Carrot  0.125
12         2     7            NaN       NaN    NaN
13         2     8            NaN       NaN    NaN
  • Related