I want to prepare my text data that is in a pandas dataframe for sentiment analysis with nltk. For that, I'm using code for a function that converts each row of a pandas dataframe into a corpus.
import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
for index, r in df.iterrows():
date=r['Date']
tweet=r['Text']
place=r['Place']
fname=str(date) '_' '.txt'
corpusfile=open(corpusfolder '/' fname,'a')
corpusfile.write(str(tweet) " " str(date))
corpusfile.close()
CreateCorpusFromDataFrame(myfolder,mydf)
The problem is I keep getting the message that
NameError: name 'myfolder' is not defined
Even though I have a folder called 'myfolder' in the same path directory of jupyter notebook that my code is in?
UPDATE:
I can see now that the issue was simply that I needed to pass the folder name as a string. Now that I've done that and amended my code. The problem I have now is that the contents of the text file created with the function are not being written into a corpus and the type of variable being created is a 'NoneType'.
import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
for index, r in df.iterrows():
id=r['Date']
tweet=r['Text']
#place=r['Place']
#fname=str(date) '_' '.txt'
fname='tweets' '.txt'
corpusfile=open(corpusfolder '/' fname,'a')
corpusfile.write(str(tweet) " ")
corpusfile.close()
corpus df = CreateCorpusFromDataFrame('myfolder',mydf)
type(corpusdf)
NoneType
CodePudding user response:
Problem
You are passing myfolder
as a variable to your function which you have not defined in your code and hence it raises a NameError.
Solution
Just replace it with 'myfolder'
[pass it as a string].
CreateCorpusFromDataFrame('myfolder',mydf)