Home > Enterprise >  If two seperate cells in a pandas dataframe doesn't contain a text, drop the entire row?
If two seperate cells in a pandas dataframe doesn't contain a text, drop the entire row?

Time:12-16

Pandas Dataframe hypothetical example:

'A' 'B' 'C'
A 1 B 1  1
A 2 B 1  2
A 3 B 1  3

Let's say i want to only keep the rows where column 'A' contains '1' and column 'B' contains '1', any other rows that dont meet this condition gets dropped.

So the output dataframe looks like this:

'A' 'B' 'C'
A 1 B 1  1

My attempt was to iterate through each row in column A and B:

for i,j in df.iterrows():
    if "1" in (df['A']) & (df['B']):
        print()
    else:
        df.drop()

But i got this error instead:

TypeError: unsupported operand type(s) for &: 'str' and 'str'

Is there another way to do this?

CodePudding user response:

You can use Series.str.contains for the A and B columns to return a mask for each, where the item is True if that item in the column contains 1, False otherwise. Then use & to join them together (i.e., return a new mask where each item is True if both items in the other masks are True, False otherwise), and use the result to index the dataframe:

subset = df[df['A'].str.contains('1') & df['B'].str.contains('1')]

Output:

>>> subset
     A    B  C
0  A 1  B 1  1
  • Related