So I have a DataFrame. As you will see, I want to iterate through it. What I can't figure out is how to iterate through each row as well. If I find a row while iterating through the data that has at least one item in a list of bad items, then I want to remove that entire row.
This is a DataFrame named only_dna
:
a_base | a_base | a_base | a_base |
---|---|---|---|
A | C | G | G |
DUPE | 0 | ? | NTC |
In the second row are all the items I’m checking each row value against to see if they exist and if they do, I’ll get rid of the row. However, I have not figured out how to do this yet. And that is my question:
Here is a half baked idea that I've come up with, this is not going to work obviously, and actually, I'm wondering if it's the right line of thinking?
bad_data = ['?','0','DUPE','NTC']
rows = len(only_dna.axes[0])
cols = len(only_dna.axes[1])
for i, d in only_dna.iterrows():
if only_dna.iloc[i].contains(bad_data):
only_dna.drop.iloc[i]
CodePudding user response:
IIUC, you could create a boolean mask and filter out the rows with bad data:
mask = only_dna.apply(lambda row: any(x in bad_data for x in row), axis=1)
out = only_dna[~mask]
Output:
a_base a_base.1 a_base.2 a_base.3
0 A C G G