I have a data frame that looks similar to the following:
df = pd.DataFrame({
'employee_id' : [123, 456, 789],
'country_code' : ['US', 'CAN', 'MEX'],
'comments' : (['good performer', 'due for raise', 'should be promoted'],
['bad performer', 'should be fired', 'speak to HR'],
['recently hired', 'needs training', 'shows promise'])
})
df
employee_id country_code comments
0 123 US [good performer, due for raise, should be promoted]
1 456 CAN [bad performer, should be fired, speak to HR]
2 789 MEX [recently hired, needs training, shows promise]
I would like to be able to filter the comments
column to remove any rows containing the string 'performer'. To do so, I'm using:
df = df[~df['comments'].str.contains('performer')]
But, this returns an error:
TypeError: ufunc 'invert' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
Thanks in advance for any assistance you can give!
CodePudding user response:
if IIUC You need to break the comments column down into a string instead of a list
df = pd.DataFrame({
'employee_id' : [123, 456, 789],
'country_code' : ['US', 'CAN', 'MEX'],
'comments' : (['good performer', 'due for raise', 'should be promoted'],
['bad performer', 'should be fired', 'speak to HR'],
['recently hired', 'needs training', 'shows promise'])
})
df['comments'] = df['comments'].apply(lambda x : ' '.join(x))
df = df[~df['comments'].str.contains('performer')]
df
CodePudding user response:
As you have lists in your Series, you cannot vectorize. You can use a list comprehension:
df2 = df[[all('performer' not in x for x in l)
for l in df['comments']]]
Output:
employee_id country_code comments
2 789 MEX [recently hired, needs training, shows promise]
CodePudding user response:
You could concatenate the list into one string first using apply
, and then test for the word you're interested in:
df=df[~df['comments'].apply(lambda x: ' '.join(x)).str.contains('performer')]