Home > Mobile >  identify substring using str.contains by mixing AND and OR
identify substring using str.contains by mixing AND and OR

Time:10-14

I am trying to identify substrings in a given string using str.contains while mixing the OR and AND

I know that OR can be represented by |

str.contains("error|break|insufficient")

and that AND can be represented by AND

str.contains("error|break|insufficient") & str.contains("status")

I would like to mix the OR and AND together. Example is to identify strings that have "error" OR "break OR ("insufficient" AND "status")

So for sentence like "error break insufficient" -> it will be able to identify. But now is not able to because there is no "status" in the sentence

CodePudding user response:

One approach:

import pandas as pd

# toy data
s = pd.Series(["hello", "world", "simple", "error", "break", "insufficient something status", "status"])

# create and mask
insufficient_and_status = s.str.contains("insufficient") & s.str.contains("status")

# create or mask
break_or_error = s.str.contains("error|break", regex=True)

# or the two mask
mask = break_or_error | insufficient_and_status

res = s[mask]
print(res)

Output

3                            error
4                            break
5    insufficient something status
dtype: object

Alternative, using a single regex:

mask = s.str.contains("error|break|(insufficient. status|status. insufficient)", regex=True)

res = s[mask]
print(res)

The alternative is based on the fact that if the string contains insufficient and status then at least one of the patterns insufficient. status or status. insufficient matches (i.e. or insufficient occurs first or status does)

  • Related