Home > Back-end >  Error when passing argument through function for converting pandas dataframe of tweets into corpus f
Error when passing argument through function for converting pandas dataframe of tweets into corpus f

Time:07-01

I want to prepare my text data that is in a pandas dataframe for sentiment analysis with nltk. For that, I'm using code for a function that converts each row of a pandas dataframe into a corpus.

import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
    for index, r in df.iterrows():
        date=r['Date']
        tweet=r['Text']
        place=r['Place']
        fname=str(date) '_' '.txt'
        corpusfile=open(corpusfolder '/' fname,'a')
        corpusfile.write(str(tweet)  " "  str(date))
        corpusfile.close()
CreateCorpusFromDataFrame(myfolder,mydf)

The problem is I keep getting the message that

NameError: name 'myfolder' is not defined

Even though I have a folder called 'myfolder' in the same path directory of jupyter notebook that my code is in?

UPDATE:

I can see now that the issue was simply that I needed to pass the folder name as a string. Now that I've done that and amended my code. The problem I have now is that the contents of the text file created with the function are not being written into a corpus and the type of variable being created is a 'NoneType'.

import nltk
# convert each row of the pandas dataframe of tweets into corpus files
def CreateCorpusFromDataFrame(corpusfolder,df):
    for index, r in df.iterrows():
        id=r['Date']
        tweet=r['Text']
        #place=r['Place']
        #fname=str(date) '_' '.txt'
        fname='tweets' '.txt'
        corpusfile=open(corpusfolder '/' fname,'a')
        corpusfile.write(str(tweet)  " ")
        corpusfile.close()
corpus df = CreateCorpusFromDataFrame('myfolder',mydf)
type(corpusdf)
NoneType

CodePudding user response:

Problem

You are passing myfolder as a variable to your function which you have not defined in your code and hence it raises a NameError.

Solution

Just replace it with 'myfolder' [pass it as a string].

CreateCorpusFromDataFrame('myfolder',mydf)
  • Related