I'm new to pandas and i'm trying to understand, if there is a method to find out, if two values from one row in df1 are between two values from one row in df2.
Basically my df1 looks like this:
start | value | end
1 | TEST | 5
2 | TEST | 3
...
and my df2 looks like this:
start | value | end
2 | TEST2 | 10
3 | TEST2 | 4
...
Right now i've got it working with two loops:
for row in df1.iterrows():
for row2 in df2.iterrows():
if row2[1]["start"] >= row[1]["start"] and row2[1]["end"] <= row[1]["end"]:
print(row2)
but this doesn't feel like it's the pandas way to me.
What I'm expecting is that row number 2 from df2 is getting printed because 3 > 1 and 4 < 5, i.e.:
3 | TEST2 | 4
Is there a method to do this in the pandas kind of working?
CodePudding user response:
You could use a cross merge
to get all combinations of df1
and df2
rows, and filter using classical comparisons. Finally, get the indices and slice:
idx = (df1.merge(df2.reset_index(), suffixes=('1', '2'), how='cross')
.query('(start2 > start1) & (end2 < end1)')
['index'].unique()
)
df2.loc[idx]
NB. I am using unique
here to ensure that a row is selected only once, even if there are several matches
output:
start value end
1 3 TEST2 4