I have tokenized the text and want to print error for the sentences without a pos but it prints error for every single sentence. How should I change it?
sents = nltk.sent_tokenize(text)
for sent in sents:
tokens = nltk.word_tokenize(sent)
tagged = nltk.pos_tag(tokens)
for pos in tagged:
if 'VB' not in sents :
print('error')
CodePudding user response:
text = "this sentence has verb. this one not"
sents = nltk.sent_tokenize(text)
for sent in sents:
has_verb = False
tokens = nltk.word_tokenize(sent)
pos_tags = nltk.pos_tag(tokens)
for pos_tag in pos_tags:
if 'VB' in pos_tag[1] :
has_verb=True
break
if not has_verb:
print(f'error: "{sent}" does not have verb')
CodePudding user response:
As @baileythegreen pointed out, thought your last condition is only after tagging the sents, if 'VB' not in sents:
is checking the entire tokenized text.
which returns true for every iteration even if the sent you're iterating over has a 'VB' tag in it.
you probably should use a flag E.g. has_VB = False and the condition should beif 'VB' not in tagged[1]: has_VB = True else: if has_VB: print(error)