Happy New Year, everyone I have large df with texts:
target = [['cuantos festivales conciertos sobre todo persona perdido esta pandemia'],
['existe impresión estar entrando últimos tiempos pronto tarde mayoría vivimos sufriremos'],
['pandemia sigue hambre acecha humanidad faltaba mueren inundaciones bélgica alemania'],
['nombre maría ángeles todas mujeres sido asesinadas hecho serlo esta pandemia lugares de trabajo']]
and 4 sets of word like:
words1 = ['festivales', 'pandemia', 'lugares de trabajo', 'mueren', 'faltaba']
words2 = ['persona ', 'faltaba', 'entrando', 'sobre']
moreover, words from the set may contain spaces, like in 'lugares de trabajo'
I need to count how many times the words from the list are present in each line in the sum (I don't need how many times one of the words appears)
so result df looks like
word_set1 word_set_2
1 1 1
2 0 1
3 2 1
4 1 0
i try this for count (then I planned to just summarize the results)
for terms in words1:
df[str(terms)] = map(lambda x: x.count(str(terms)), target['tokenized'])
but got TypeError: object of type 'map' has no len()
How can I count the words? Thanks in advance for the answers
CodePudding user response:
We can use the str.count
method to get the expected result :
df['word_set1'] = df['text'].str.count('|'.join(words1))
df['word_set2'] = df['text'].str.count('|'.join(words2))
Output :
text word_set1 word_set2
0 cuantos festivales conciertos sobre todo perso... 2 2
1 existe impresión estar entrando últimos tiempo... 0 1
2 pandemia sigue hambre acecha humanidad faltaba... 3 1
3 nombre maría ángeles todas mujeres sido asesin... 2 0