How solved Python text processing: AttributeError: 'list' object has no attribute 'lo-CodePudding

I am new to Python to Stackoverflow and GUI Python (please be gentle) and am trying to aplaing how to do a KNN analysis. I am using a combination of codes I built it myself: Python - AttributeError:

doc = doc.lower()
AttributeError: 'list' object has no attribute 'lower'

This is my code:

selct = StringVar()
categorychoosen = ttk.Combobox(top, width = 27, textvariable = selct)
categorychoosen['values'] = (' Computer Science', 
                          ' computer engineering',
                          ' Information Technology',
                          ' artificial intelligence',
                          ' cyber security',
                          ' computer networks',
                          ' Information Security',
                          ' Management Information Systems',
                          ' Software engineering',
                          ' data analysis',
                          ' Data Science')
  
categorychoosen.grid(row=1, column=2)
categorychoosen.current()

s = StringVar()
choosen = ttk.Combobox(top, width = 27, textvariable = s)
choosen['values'] = (' Computer Science', 
                          ' computer engineering',
                          ' Information Technology',
                          ' artificial intelligence',
                          ' cyber security',
                          ' computer networks',
                          ' Information Security',
                          ' Management Information Systems',
                          ' Software engineering',
                          ' data analysis',
                          ' Data Science')
  
choosen.grid(row=1, column=3)
choosen.current()

def model():
    
    from sklearn.model_selection import train_test_split
    from sklearn.feature_extraction.text import TfidfVectorizer
    from scipy.sparse import hstack
    from sklearn.multiclass import OneVsRestClassifier
    from sklearn.neighbors import KNeighborsClassifier

    resume = pd.read_csv(r'/Users/asma/Desktop/UpdatedResumeDataSet.csv')

    #DATA
    x = resume['Resume'].values
    y = resume['Category'].values
    v = [[selct.get(),s.get()]]

    #transform
    word = TfidfVectorizer(sublinear_tf=True, stop_words='english')
    word.fit(x)
    wordFeatures = word.transform(x)
    
    w = TfidfVectorizer(sublinear_tf=True, stop_words='english')
    w.fit(v)
    wx = word.transform(v)

    # to 2D Array
    wx.reshape(-1, 1)
    wordFeatures.reshape(-1, 1)
    x.reshape(-1, 1)

    #KNN 
    model = KNeighborsClassifier(n_neighbors=5, metric= 'euclidean')
    model.fit(wordFeatures,y)
    x_test = wx
    y_pred = model.predict([x_test])
    jobR = Label(top,text=str([y_pred]) ,bg='light gray').grid(row=4,column=2)

but= Button(top,text="Start",bg='gray', command=model).grid(row=3,column=0)

Where can I add the 'lower' before or after 'the transform process', and what data will I use for it? resume['Resume'].values or [[selct.get(),s.get()]].

Any help would be massively appreciated.

CodePudding user response：

doc is a list object, which has elements. Now, you start to call its lower() method, but it has no such method, while the list's items may be having a lower method.

The list is a data structure that contains items and it is not to be confounded with its items.

The lower() method is a known method of String, which makes it highly probable that your items in the list are string objects.

You can use map() to convert String items of a list to lower case in Python, see more here: https://www.delftstack.com/howto/python/python-lowercase-list/

CodePudding user response：

Please add the line where you define "doc".

As far as I can tell from the error message, you are trying to apply a String method to a List. I assume you want to apply the .lower() method to Strings that are gathered in doc.

So try this inline list comprehension to apply the .lower() methods to the String elements and create a new list with the new elements at the same time:

doc = [d.lower() for d in doc]

Hope, this helps, otherwise please give us more details.