This is the first part of my code that detect specific pattern in the strings:
import pandas as pd
df = pd.DataFrame({'text':["Is it possible to apply [NUM] times","Is it possible to apply [NUM] time",
"Called [NUM] hour ago","waited [NUM] hours","waiting [NUM] minute",
"waiting [NUM] minutes?","Are you kidding me !!","Waiting?",
"I didn't like it!!"]})
df['available'] = df['text'].str.contains(r'\[NUM]\s*(?:hour|minute|time)s?\b|!|\?{2}', regex=True)
And this is the second part that detects specific words in the strings:
my_domain = ['refund', 'cancel','cancelled','wait time', 'change', 'real person', 'disconnected', 'seriously', 'charge', 'agent', 'issue']
df['available'] = df['text'].str.contains('|'.join(my_domain), regex=True)
But as Im doing it in two different line of code, some of the first part will be rewritten with the second part but I don't want that. I was wondering how can I combine them together so it will be applied on the whole text at once.
CodePudding user response:
You might need |
or:
df['available'] = df['text'].str.contains(r'\[NUM]\s*(?:hour|minute|time)s?\b|!|\?{2}', regex=True) | df['text'].str.contains('|'.join(my_domain), regex=True)