Home > Net >  How to find the list of words in the corpus
How to find the list of words in the corpus

Time:08-27

Here I have to find words which are in the list c, this words are present in the corpus rows or not.

I am expecting the answer as [1,3,2,4,1,1,4,1,4]

means word "and" is present in row 3 hence answer "1"

word "document" is present in the row1,row2 and row4 hence answer is "3" and so on

kindly correct my program, also if you have any easiest one then also suggest. Thank you

corpus= [
         'this is the first document',            #row1
         'this document is the second document',  #row2
         'and this is the third one',             #row3
         'is this the first document',            #row4
    ]

c=['and', 'document', 'first', 'is', 'one', 'second', 'the', 'third', 'this']

a=[]
count=0

for words in c:
  a.append(count)
  count=0
  for row in corpus:
    if words in row:
      count=count 1
print(a)

CodePudding user response:

Alll your problem is that you use append() in wrong place.

You have to use it after for-loop.

for words in c:
  count=0
  for row in corpus:
    if words in row:
      count=count 1
  a.append(count)

CodePudding user response:

This seems to be functional.

from collections import Counter

words = []
for corp in corpus:
    words.extend(corp.split())

word_counts = Counter(words)

word_counts_list = []
for word in c:
    if word not in word_counts:
        word_counts_list.append(0)
    else:
        word_counts_list.append(word_counts[word])

Not the result you were expecting but the result you were expecting is not correct.

word_counts_list
Out[136]: [1, 4, 2, 4, 1, 1, 4, 1, 4]
  • Related