Home > Net >  count the (total) number of special words in large pandas df
count the (total) number of special words in large pandas df

Time:01-01

Happy New Year, everyone I have large df with texts:

target = [['cuantos festivales conciertos sobre todo persona perdido esta pandemia'],
['existe impresión estar entrando últimos tiempos pronto tarde mayoría vivimos sufriremos'],
['pandemia sigue hambre acecha humanidad faltaba mueren inundaciones bélgica alemania'],
['nombre maría ángeles todas mujeres sido asesinadas hecho serlo esta pandemia lugares de trabajo']]

and 4 sets of word like:

words1 = ['festivales', 'pandemia', 'lugares de trabajo', 'mueren', 'faltaba']
words2 = ['persona ', 'faltaba', 'entrando', 'sobre'] 

moreover, words from the set may contain spaces, like in 'lugares de trabajo'
I need to count how many times the words from the list are present in each line in the sum (I don't need how many times one of the words appears) so result df looks like

  word_set1 word_set_2
1     1          1
2     0          1
3     2          1
4     1          0

i try this for count (then I planned to just summarize the results)

for terms in words1:
    df[str(terms)] = map(lambda x: x.count(str(terms)), target['tokenized'])

but got TypeError: object of type 'map' has no len()

How can I count the words? Thanks in advance for the answers

CodePudding user response:

We can use the str.count method to get the expected result :

df['word_set1'] = df['text'].str.count('|'.join(words1))
df['word_set2'] = df['text'].str.count('|'.join(words2))

Output :

    text                                                word_set1   word_set2
0   cuantos festivales conciertos sobre todo perso...   2           2
1   existe impresión estar entrando últimos tiempo...   0           1
2   pandemia sigue hambre acecha humanidad faltaba...   3           1
3   nombre maría ángeles todas mujeres sido asesin...   2           0
  • Related