Home > Back-end >  Loop over each word in each row and remove words if in a list
Loop over each word in each row and remove words if in a list

Time:06-08

I have the below column in a dataframe (each row is a person and there are a list of tokenised words in each cell).

Q395_R

[due, car, accident, year, ago, medical, condi...
[spending, time, loved, one, commute, able, co...
[initially, understanding, need, lockdown, ero...
[time, focus, exercise, le, sport, do, poured,..
[spending, time, family, realisation, need, ru...

I also have a list of words:

words395 = ['rising',
 'accident',
 'le',
 'lasted',
 'understanding',
 'spending',
 'adopted',
 'raising',
 'fabulous',
 'loneliness',
 'contract',....]

I would like to create a function that

  1. loops over each person in each row
  2. loop over each word in each row
  3. deletes words in each cell if the word is in the list words395

I am not sure how to create two loops together to go through each person and word, can someone help with this?

Expected outcome:

Q395_R
    
[due, car, year, ago, medical, condi...
[time, loved, one, commute, able, co...
[initially, need, lockdown, ero...
[time, focus, exercise, sport, do, poured,..
[time, family, realisation, need, ru...

CodePudding user response:

Use lambda function with convert values to list to sets:

s = set(words395)
df['Q395_R'] = df['Q395_R'].apply(lambda x: [y for y in x if y not in s])
  • Related