This is my main dataframe that I want to filter.
first.seqnames first.start first.end first.width first.strand second.seqnames second.start second.end second.width second.strand
126457 chr1 10590184 10590618 GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC... * chr1 10730773 10731207 GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC... *
126461 chr1 10590958 10591541 CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA... * chr1 10731548 10732131 CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA... *
126544 chr1 10597414 10597918 ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC... * chr1 10738018 10738522 ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC... *
126576 chr1 10600437 10600904 CTCGTTACCATGAAAGCTTTTTTAGCATTGATTTCATAACAGTCTT... * chr1 10741045 10741512 CTCGTTACCATGAAAGCTTTTTTAGCATTGATTTCATAACAGTCTT... *
131172 chr1 11082133 11082593 TGAATCAGTGGTTTAATCTTCTTTGTTTACATCCCTTATTTCTTAT... * chr1 11245253 11245713 TGAATCAGTGGTTTAATCTTCTTTGTTTACATCCCTTATTTCTTAT... *
This is my conditional dataframe based on which I will filter:
Chrom Start End
0 chr1 10590184 10590618
1 chr1 10590958 10591541
2 chr1 10597414 10597918
I've tried the following logic to filter each row. But it's wrong; it is not comparing each row.
header_frame[header_frame['first.end'].isin(knee_df['End']) & header_frame['first.start'].isin(knee_df['Start'])]
I want only those rows in the 1st dataframe which exist in the 2nd dataframe.
CodePudding user response:
Assuming df1
and df2
the two dataframes, you can inner merge
:
df1.merge(df2,
left_on=['first.seqnames', 'first.start', 'first.end'],
right_on=['Chrom', 'Start', 'End'],
how='inner'
)[df1.columns]
output:
first.seqnames first.start first.end first.width first.strand second.seqnames second.start second.end second.width second.strand
0 chr1 10590184 10590618 GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC... * chr1 10730773 10731207 GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC... *
1 chr1 10590958 10591541 CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA... * chr1 10731548 10732131 CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA... *
2 chr1 10597414 10597918 ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC... * chr1 10738018 10738522 ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC... *