I have data frames
data = [[1, 5,2,10], [1, 5,-1,-1]]
df = pd.DataFrame(data, columns = ['START', 'END', 'g','g2'])
START END g g2
1 5 2 10
1 5 -1 -1
I want to select rows in which g is -1
and do a query with START and END against another data frame like this
data = [[2, 10,'foo'], [1, 5,'faa']]
df2 = pd.DataFrame(data, columns = ['START', 'END', 'data'])
START2 END2 data
2 10 foo
1 5 faa
The query is an overlapping
.query(START.between(START2,END2) | (END.between(START2,END2)
If an overlap is found save something in a new column in df1
START END g g2 pass_validation
1 5 2 10
1 5 -1 -1 true
I know how to select a row by condition, and I know how to do the query but I cant achieve it altogether.
CodePudding user response:
How about something like this?
df['pass_validation'] = df['g'].eq(-1) & (df['START'].between(df2['START2'], df2['END2']) | df['END'].between(df2['START2'], df2['END2']))
Output:
>>> df
START END g g2 pass_validation
0 1 5 2 10 False
1 1 5 -1 -1 True