'list' object has no attribute 'split' in an NLP question-CodePudding

While running the following code snippet, I get the following error 'list' object has no attribute 'split'


for i in range(len(questions1)):

    # Question strings need to be separated into words
    # Each question needs a unique label

            questions_labeled.append(TaggedDocument(questions1[i].split(), df[df.index == i].qid1))
            questions_labeled.append(LabeledSentence(questions2[i].split(), df[df.index == i].qid2))
            if i % 10000 == 0:
                progress = i/len(questions1) * 100
                print("{}% complete".format(round(progress, 2)))```

CodePudding user response：

Because list has no split() only string objects have split.

CodePudding user response：

The questions1 and questions2 objects seem to hold lists of strings (e.g., questions1 = [['this is a sample text', 'this is another one'],['this is some other text],...]), and not just strings (e.g., questions1 = ['this is a sample text', 'this is another one',...]). Hence the error (i.e., 'list' object has no attribute 'split'), as you are trying to split a list instead of a string. One way to solve this is to create a flast list out of each list of lists, before iterating over them, as described here. For example:

questions1 = [item for sublist in questions1 for item in sublist]
questions2 = [item for sublist in questions2 for item in sublist]