Home > other >  Find rows with df.iterrows() and drop some based on condition
Find rows with df.iterrows() and drop some based on condition

Time:03-21

So I have a DataFrame. As you will see, I want to iterate through it. What I can't figure out is how to iterate through each row as well. If I find a row while iterating through the data that has at least one item in a list of bad items, then I want to remove that entire row.

This is a DataFrame named only_dna:

a_base a_base a_base a_base
A C G G
DUPE 0 ? NTC

In the second row are all the items I’m checking each row value against to see if they exist and if they do, I’ll get rid of the row. However, I have not figured out how to do this yet. And that is my question:

Here is a half baked idea that I've come up with, this is not going to work obviously, and actually, I'm wondering if it's the right line of thinking?

bad_data = ['?','0','DUPE','NTC']

rows = len(only_dna.axes[0])
cols = len(only_dna.axes[1])



for i, d in only_dna.iterrows():

    if only_dna.iloc[i].contains(bad_data):

        only_dna.drop.iloc[i]        
        

CodePudding user response:

IIUC, you could create a boolean mask and filter out the rows with bad data:

mask = only_dna.apply(lambda row: any(x in bad_data for x in row), axis=1)
out = only_dna[~mask]

Output:

  a_base a_base.1 a_base.2 a_base.3
0      A        C        G        G
  • Related