Home > database >  Join DataFrames on Condition Pandas
Join DataFrames on Condition Pandas

Time:07-19

I have the following two dataframes with binary values that I want to merge.

df1

    Action  Adventure   Animation   Biography   
0   0       1           0           0   
1   0       0           0           0   
2   1       0           0           0   
3   0       0           0           0   
4   1       0           0           0

df2

    Action  Adventure   Biography   Comedy  
0   0       0           0           0   
1   0       0           1           0   
2   0       0           0           0   
3   0       0           0           1   
4   1       0           0           0   

I want to join these two data frames in a way that the result has the distinct columns and if in one dataframe the value is 1 then the result has 1, if not it has 0.

Result

    Action  Adventure   Animation   Biography   Comedy
0   0       1           0           0           0
1   0       0           0           1           0
2   1       0           0           0           0
3   0       0           0           0           1 
4   1       0           0           0           0

I am stuck on this so I don not have a proposed solution.

CodePudding user response:

Let us add the two dataframes then clip the upper value:

df1.add(df2, fill_value=0).clip(upper=1).astype(int)

   Action  Adventure  Animation  Biography  Comedy
0       0          1          0          0       0
1       0          0          0          1       0
2       1          0          0          0       0
3       0          0          0          0       1
4       1          0          0          0       0

CodePudding user response:

Thinking it as set problem may give you solution. Have look to code.

print((df1 | df2).fillna(0).astype(int) | df2)

COMPLETE CODE:

import pandas as pd

df1 = pd.DataFrame(
    {
        'Action':[0, 0, 1, 0, 1],
        'Adventure':[1, 0, 0, 0, 0],
        'Animation':[0, 0, 0, 0, 0],
        'Biography':[0, 0, 0, 0, 0]
    }
)

df2 = pd.DataFrame(
    {
        'Action':[0, 0, 1, 0, 1],
        'Adventure':[1, 0, 0, 0, 0],
        'Animation':[0, 0, 0, 0, 0],
        'Biography':[0, 1, 0, 0, 0],
        'Comedy':[0, 0, 0, 1, 0]
    }
)

print((df1 | df2).fillna(0).astype(int) | df2)

OUTPUT:

   Action  Adventure  Animation  Biography  Comedy
0       0          1          0          0       0
1       0          0          0          1       0
2       1          0          0          0       0
3       0          0          0          0       1
4       1          0          0          0       0
  • Related