Compare 2 different columns of different dataframes-CodePudding

I have 2 dataframes let's say->

df1 =>

colA  colB  colC
0     1     2   
3     4     5 
6     7     8

df2 (same number of rows and columns) =>

colD  colE  colF
10     11     12    
13     14     15  
16     17     18

I want to compare columns from both dataframes , example ->

df1['colB'] < df2['colF']

Currently I am getting ->

ValueError: Can only compare identically-labeled Series objects

while comparing in ->

EDIT :

df1.loc[

    df1['colB'] < df2['colF']

],'set_something' = 1;

Any help how I can implement it ? Thanks

CodePudding user response：

You have the error because your series are not aligned (and might have duplicated indices)

If you just care about position, not indices use the underlying nummy array:

df1['colB'] < df2['colF'].to_numpy()

If you want to assign back in a column, make sure to transform the column full the other DataFrame to array.

df1['new'] = df1['colB'] < df2['colF'].to_numpy()

df2['new'] = df1['colB'].to_numpy() < df2['colF']

CodePudding user response：

This is a non-equi join; you should get more performance with some of binary search; conditional_join from pyjanitor does that under the hood:

# pip install pyjanitor
import pandas as pd
import janitor
df1.conditional_join(df2, ('colB', 'colF', '<')) 
   colA  colB  colC  colD  colE  colF
0     0     1     2    10    11    12
1     0     1     2    13    14    15
2     0     1     2    16    17    18
3     3     4     5    10    11    12
4     3     4     5    13    14    15
5     3     4     5    16    17    18
6     6     7     8    10    11    12
7     6     7     8    13    14    15
8     6     7     8    16    17    18

If it is based on an equality (df1.colB == df2.colF), then pd.merge should suffice and is efficient