Home > OS >  Compare 2 different columns of different dataframes
Compare 2 different columns of different dataframes

Time:10-11

I have 2 dataframes let's say->

df1 =>

colA  colB  colC
0     1     2   
3     4     5 
6     7     8   

df2 (same number of rows and columns) =>

colD  colE  colF
10     11     12    
13     14     15  
16     17     18   

I want to compare columns from both dataframes , example ->

df1['colB'] < df2['colF']

Currently I am getting ->

ValueError: Can only compare identically-labeled Series objects

while comparing in ->

EDIT :

df1.loc[

    df1['colB'] < df2['colF']

],'set_something' = 1;

Any help how I can implement it ? Thanks

CodePudding user response:

You have the error because your series are not aligned (and might have duplicated indices)

If you just care about position, not indices use the underlying nummy array:

df1['colB'] < df2['colF'].to_numpy()

If you want to assign back in a column, make sure to transform the column full the other DataFrame to array.

df1['new'] = df1['colB'] < df2['colF'].to_numpy()

Or

df2['new'] = df1['colB'].to_numpy() < df2['colF']

CodePudding user response:

This is a non-equi join; you should get more performance with some of binary search; conditional_join from pyjanitor does that under the hood:

# pip install pyjanitor
import pandas as pd
import janitor
df1.conditional_join(df2, ('colB', 'colF', '<')) 
   colA  colB  colC  colD  colE  colF
0     0     1     2    10    11    12
1     0     1     2    13    14    15
2     0     1     2    16    17    18
3     3     4     5    10    11    12
4     3     4     5    13    14    15
5     3     4     5    16    17    18
6     6     7     8    10    11    12
7     6     7     8    13    14    15
8     6     7     8    16    17    18

If it is based on an equality (df1.colB == df2.colF), then pd.merge should suffice and is efficient

  • Related