Home > other >  Python me how to call all text corpus folder
Python me how to call all text corpus folder

Time:10-22

# me and related installation package
The import me
The from me. Tokenize import sent_tokenize
The from me. Tokenize import PunktSentenceTokenizer
The from me. Corpus import webtext
The from me. Tokenize import sent_tokenize
The from me. Corpus import stopwords
# import file and change to me apply text, using me for sentence segmentation
With the open (' D: \ Python \ my_corpus \ TEM4_2005 TXT ', encoding="utf-8") as f:
Text=f.r ead ()
Corpus_root=r 'D: \ Python \ my corpus'
Sent_tokenizer=PunktSentenceTokenizer (corpus_root, [' TEM4. * \. TXT '])
Sents=sent_tokenizer. Tokenize (text) # segmentation sentence
# sents [1]
# print (sents)
# query words and match the corresponding sentences
A=input (" please enter the words: ")
B=input (" please enter the words: ")
For lines in sents:
If A in lines:
If B in lines:
Print (" contains the word of the sentence: ", lines)
-- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- -- --
Me I am a beginner, I want to achieve is to all the corpus in the my_corpus text word query, but now sents=sent_tokenizer. Tokenize (text) can only read a single text with the open will call a single text, I change how to rewrite the code, please
  • Related