How can I put multiple conditions for detecting a pattern in pandas using regex-CodePudding

I have a Dataframe like this:

text

Is it possible to apply [NUM] times
Is it possible to apply [NUM] time
Called [NUM] hour ago
waited [NUM] hours
waiting [NUM] minute
waiting [NUM] minutes???
Are you kidding me !
Waiting?

I want to be able to detect pattern that have "[NUM] time" or "[NUM] times" or "[NUM] minute" or "[NUM] minutes" or "[NUM] hour" or "[NUM] hours". Also, if it has "!" (or more than one !) or "??" (at least two ?).

So the result would look like this:

text.                                  available

Is it possible to apply [NUM] times.   True
Is it possible to apply [NUM] time.    True
Called [NUM] hour ago                  True
waited [NUM] hours                     True
waiting [NUM] minute                   True
waiting [NUM] minutes???               True
Are you kidding me !                   True
Waiting?                               False
I didn't like it                       False

So I want something like this but don't know how to put all these condition together:

df["available"] = df['text'].apply(lambda x: re.match(r'[\!* | \?  | [NUM] time | [NUM] hour | [NUM] minute]')

CodePudding user response：

You can use Series.str.contains with a regex:

import pandas as pd
df = pd.DataFrame({'text':["Is it possible to apply [NUM] times","Is it possible to apply [NUM] time","Called [NUM] hour ago","waited [NUM] hours","waiting [NUM] minute","waiting [NUM] minutes???","Are you kidding me !","Waiting?", "I didn't like it"]})
df['available'] = df['text'].str.contains(r'\[NUM]\s*(?:hour|minute|time)s?\b|!|\?{2}', regex=True)
## => df['available']
#     0     True
#     1     True
#     2     True
#     3     True
#     4     True
#     5     True
#     6     True
#     7    False
#     8    False

See the regex demo. Details:

\[NUM] - [NUM] string
\s* - zero or more whitespaces
(?:hour|minute|time) - a non-capturing group matching hour, minute or time
s? - an optional s
\b - a word boundary
| - or
! - a ! char
| - or
\?{2} - two question marks.