I have a data frame (df_validation) in which I have done some validation. If all was ok, this should be empty. If not, rows with info like this will be found
CHR START END
1 1000 2000 # Imagine this failed
Now, I want to communicate to another data frame (df) that validation was ok except for the ones that appear in df_validation
This is df
CHR START END
1 1000 2000
2 1000 2000
3 1000 2000
4 1000 2000
5 1000 2000
and this is the result I would like to have
CHR START END Pass_validation?
1 1000 2000 False
2 1000 2000 True
3 1000 2000 True
4 1000 2000 True
5 1000 2000 True
How can I do this? The first idea was to merge both but having same column name, it adds a '_X' on left and '_Y' on right. The new right columns I don't really need, just a new column as in the example.
CodePudding user response:
Idea is use indicator=True
parameter for helper column _merge
and for False
for match compare for not equal both
. If is omit on
parameter is joined by intersection of columnsname in both dataFrames, here CHR
, START
and END
.
df2['Pass_validation?'] = df2.merge(df_validation,
indicator=True,
how='left')['_merge'].ne('both')
print (df2)
CHR START END Pass_validation?
0 1 1000 2000 False
1 2 1000 2000 True
2 3 1000 2000 True
3 4 1000 2000 True
4 5 1000 2000 True
Details:
print (df2.merge(df_validation, indicator=True, how='left'))
CHR START END _merge
0 1 1000 2000 both
1 2 1000 2000 left_only
2 3 1000 2000 left_only
3 4 1000 2000 left_only
4 5 1000 2000 left_only