I have 2 dataframes let's say->
df1 =>
colA colB colC
0 1 2
3 4 5
6 7 8
df2 (same number of rows and columns) =>
colD colE colF
10 11 12
13 14 15
16 17 18
I want to compare columns from both dataframes , example ->
df1['colB'] < df2['colF']
Currently I am getting ->
ValueError: Can only compare identically-labeled Series objects
while comparing in ->
EDIT :
df1.loc[
df1['colB'] < df2['colF']
],'set_something' = 1;
Any help how I can implement it ? Thanks
CodePudding user response:
You have the error because your series are not aligned (and might have duplicated indices)
If you just care about position, not indices use the underlying nummy array:
df1['colB'] < df2['colF'].to_numpy()
If you want to assign back in a column, make sure to transform the column full the other DataFrame to array.
df1['new'] = df1['colB'] < df2['colF'].to_numpy()
Or
df2['new'] = df1['colB'].to_numpy() < df2['colF']
CodePudding user response:
This is a non-equi join; you should get more performance with some of binary search; conditional_join from pyjanitor does that under the hood:
# pip install pyjanitor
import pandas as pd
import janitor
df1.conditional_join(df2, ('colB', 'colF', '<'))
colA colB colC colD colE colF
0 0 1 2 10 11 12
1 0 1 2 13 14 15
2 0 1 2 16 17 18
3 3 4 5 10 11 12
4 3 4 5 13 14 15
5 3 4 5 16 17 18
6 6 7 8 10 11 12
7 6 7 8 13 14 15
8 6 7 8 16 17 18
If it is based on an equality (df1.colB == df2.colF
), then pd.merge
should suffice and is efficient