I have the below column in a dataframe (each row is a person and there are a list of tokenised words in each cell).
Q395_R
[due, car, accident, year, ago, medical, condi...
[spending, time, loved, one, commute, able, co...
[initially, understanding, need, lockdown, ero...
[time, focus, exercise, le, sport, do, poured,..
[spending, time, family, realisation, need, ru...
I also have a list of words:
words395 = ['rising',
'accident',
'le',
'lasted',
'understanding',
'spending',
'adopted',
'raising',
'fabulous',
'loneliness',
'contract',....]
I would like to create a function that
- loops over each person in each row
- loop over each word in each row
- deletes words in each cell if the word is in the list words395
I am not sure how to create two loops together to go through each person and word, can someone help with this?
Expected outcome:
Q395_R
[due, car, year, ago, medical, condi...
[time, loved, one, commute, able, co...
[initially, need, lockdown, ero...
[time, focus, exercise, sport, do, poured,..
[time, family, realisation, need, ru...
CodePudding user response:
Use lambda function with convert values to list to sets:
s = set(words395)
df['Q395_R'] = df['Q395_R'].apply(lambda x: [y for y in x if y not in s])