Code:
import pandas as pd
df = pd.DataFrame({'data': ['hey how r u', 'hello', 'hey abc d e f hey f', 'g h i i j k', 'hello how r u hello']})
vals = ['hey', 'hello']
I want to take all the rows that have exactly one word that is in the list vals
. In this case, these would be 'hey how r u'
, 'hello'
What I tried:
def exactly_one(text):
for v in vals:
if text.count(v) > 1:
return False
return True
df = df[df['data'].contains('|'.join(vals)) & (exactly_one(df['data'].str))]
Breaks with an error
CodePudding user response:
You can use str.count
with a regex:
df[df['data'].str.count('|'.join(vals)).eq(1)]
Output:
data
0 hey how r u
1 hello
Intermediate:
df['data'].str.count('|'.join(vals))
0 1
1 1
2 2
3 0
4 2
Name: data, dtype: int64