TypeError: string indices must be integers; how can I fix this problem in my code? [duplicate]-CodePudding

I tried to go through all cells from a CSV, from the column 'Text', and to make a new column named 'Type' where I'll have the type of text generated by predictions using Multinomial Naive Bayes.

This is the code:

from sklearn.naive_bayes import MultinomialNB
import pandas as pd
from sklearn.feature_extraction.text import CountVectorizer
dataset = pd.read_csv("Test.csv", encoding='latin-1')

clf = MultinomialNB()
cv = CountVectorizer()


for row in dataset:
    text= row['Text']
    data = cv.transform([text]).toarray()
    output = clf.predict(data)
    dataset['Type']=dataset[output]

This is my error:

text= row['Text']
TypeError: string indices must be integers

CodePudding user response：

The method used to iterate through the rows of the data frame is incorrect. So here

for row in dataset:

Only returns the 1st row , which usually contains all the column names which are normally strings. So when we do: text= row['Text'] It tries to extract the string at the index 'Text' and string indices can only be integers, hence the error.

eg: text= "abc"
>print(text[0]) #Output is 'a'. 
>print(text['abc']) #Error - string indices must be integers

So the correct way to iterate through rows and extract the required column's value would be:

for index,row in df.iterrows():
    text= row["Text"]

For information about the iterrows function , refer here : https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.iterrows.html