I want to compare two dataframes in an element-wise manner in order to find which elements in dataframe one are less than those in dataframe two.
DataFrame One:
DataFrame Two:
In this case the only elements that should return True should be elements on Column 130_x for Indexes 0 and 1. The rest should return False.
Any help would be appreciated.
CodePudding user response:
Then you need to change the column names
df1.columns = df2.columns
Or here we use replace
df1.columns = df1.columns.str.replace('_x','_y')
s = df1.eq(df2).all(1)
s.index[s]
CodePudding user response:
You could simply use df1 < df2
but your column names needs to be the same.
import pandas as pd
from io import StringIO
data1 = StringIO("""
126_x,127_x,128_x,129_x,130_x
0.0,0.04898,0.359184,0.004082,0.134694
0.0,0.04898,0.359184,0.004082,0.134694
""")
data2 = StringIO("""
126_y,127_y,128_y,129_y,130_y
0.0,0.0,0.175439,0.0,0.166667
0.006579,0.0,0.269737,0.0,0.223684
""")
df1 = pd.read_csv(data1)
df2 = pd.read_csv(data2)
# column header needs to be the same
df2.columns = df1.columns
print(df1 < df2)
# 126_x 127_x 128_x 129_x 130_x
# 0 False False False False True
# 1 True False False False True
In this case the only elements that should return True should be elements on Column 130_x for Indexes 0 and 1. The rest should return False.
That's not correct, there is another cell (Column 126_y
, Row 2) which would be true:
0.006579 > 0.0
.
CodePudding user response:
You don't need to modify the dataframes. You can use the underlying numpy array:
df1.lt(df2.values)
Output:
126_x 127_x 128_x 129_x 130_x
0 False False False False True
1 True False False False True