Home > OS >  Filter pandas dataframe rows based on multiple conditions
Filter pandas dataframe rows based on multiple conditions

Time:02-22

This is my main dataframe that I want to filter.

    first.seqnames  first.start first.end   first.width first.strand    second.seqnames second.start    second.end  second.width    second.strand
126457  chr1    10590184    10590618    GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...   *   chr1    10730773    10731207    GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...   *
126461  chr1    10590958    10591541    CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...   *   chr1    10731548    10732131    CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...   *
126544  chr1    10597414    10597918    ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...   *   chr1    10738018    10738522    ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...   *
126576  chr1    10600437    10600904    CTCGTTACCATGAAAGCTTTTTTAGCATTGATTTCATAACAGTCTT...   *   chr1    10741045    10741512    CTCGTTACCATGAAAGCTTTTTTAGCATTGATTTCATAACAGTCTT...   *
131172  chr1    11082133    11082593    TGAATCAGTGGTTTAATCTTCTTTGTTTACATCCCTTATTTCTTAT...   *   chr1    11245253    11245713    TGAATCAGTGGTTTAATCTTCTTTGTTTACATCCCTTATTTCTTAT...   *

This is my conditional dataframe based on which I will filter:

    Chrom   Start   End
0   chr1    10590184    10590618
1   chr1    10590958    10591541
2   chr1    10597414    10597918

I've tried the following logic to filter each row. But it's wrong; it is not comparing each row.

header_frame[header_frame['first.end'].isin(knee_df['End']) & header_frame['first.start'].isin(knee_df['Start'])]

I want only those rows in the 1st dataframe which exist in the 2nd dataframe.

CodePudding user response:

Assuming df1 and df2 the two dataframes, you can inner merge:

df1.merge(df2,
          left_on=['first.seqnames', 'first.start', 'first.end'],
          right_on=['Chrom', 'Start', 'End'],
          how='inner'
         )[df1.columns]

output:

  first.seqnames  first.start  first.end                                        first.width first.strand second.seqnames  second.start  second.end                                       second.width second.strand
0           chr1     10590184   10590618  GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...            *            chr1      10730773    10731207  GTTAATTATAGATAAATGGGCTAAAATTGCCTCTTGGTTTTGTAAC...             *
1           chr1     10590958   10591541  CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...            *            chr1      10731548    10732131  CTTTCTTTTGCATACTTGTAGATTTTTCTTCTACTCTGGTTTAGGA...             *
2           chr1     10597414   10597918  ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...            *            chr1      10738018    10738522  ATCATTAGGAGATTATTAAAATTTGGAGTGTGTTGGCTGGCCTCGC...             *
  • Related