I tried to go through all cells from a CSV, from the column 'Text', and to make a new column named 'Type' where I'll have the type of text generated by predictions using Multinomial Naive Bayes.
This is the code:
from sklearn.naive_bayes import MultinomialNB
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
dataset = pd.read_csv("Test.csv", encoding='latin-1')
clf = MultinomialNB()
cv = CountVectorizer()
for row in dataset:
text= row['Text']
data = cv.transform([text]).toarray()
output = clf.predict(data)
dataset['Type']=dataset[output]
This is my error:
text= row['Text']
TypeError: string indices must be integers
CodePudding user response:
The method used to iterate through the rows of the data frame is incorrect. So here
for row in dataset:
Only returns the 1st row , which usually contains all the column names which are normally strings. So when we do:
text= row['Text']
It tries to extract the string at the index 'Text' and string indices can only be integers, hence the error.
eg: text= "abc"
>print(text[0]) #Output is 'a'.
>print(text['abc']) #Error - string indices must be integers
So the correct way to iterate through rows and extract the required column's value would be:
for index,row in df.iterrows():
text= row["Text"]
For information about the iterrows function , refer here : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html