Home > database >  How to search a specific set of columns using pandas
How to search a specific set of columns using pandas

Time:03-14

I have a dataframe as below

  CHROM        POS FORMAT HG02317 HG03195 HG01959 HG03197
chr1  241663902     GT     0/0     0/0     0/0     0/1
chr1  241663902     GT     0/0     0/0     0/0     0/0
chr2   47641559     GT     0/1     0/1     0/0     0/0
chrX    47641559    GT  1   0   1   0 

I tried to use

filtered= df[df['HG02317'].str.contains("0/1|1")]

but this removes all other rows which still contains 0/1|1. I want to check all columns from iloc[3:] for 0/1 or 1 and then delete it if not present. Also, as column names from [3:] keeps varying i cant use the below as well though it works
filtered= df.query('HG02317== "0/1" | HG01959 == "0/1"')

Which means i want to search by col index Using index ,
filtered= df[(df.loc[:, 3:] == '0/1')] gives error.

My expected Output is:

  CHROM        POS FORMAT HG02317 HG03195 HG01959 HG03197
chr1  241663902     GT     0/0     0/0     0/0     0/1
chr2   47641559     GT     0/1     0/1     0/0     0/0
chrX    47641559    GT  1   0   1   0 

Is there any work around for this in pandas? Thanks in advance

CodePudding user response:

First select all columns with omit first 3 use DataFrame.iloc, then compare by DataFrame.isin and test by DataFrame.any all rows if at least one value match:

#added 1 like string or 1 like number
filtered = df[df.iloc[:, 3:].isin(['0/1', '1', 1]).any(axis=1)]
print (filtered)
  CHROM        POS FORMAT HG02317 HG03195 HG01959 HG03197
0  chr1  241663902     GT     0/0     0/0     0/0     0/1
2  chr2   47641559     GT     0/1     0/1     0/0     0/0
3  chrX   47641559     GT       1       0       1       0
  • Related