While running the following code snippet, I get the following error 'list' object has no attribute 'split'
for i in range(len(questions1)):
# Question strings need to be separated into words
# Each question needs a unique label
questions_labeled.append(TaggedDocument(questions1[i].split(), df[df.index == i].qid1))
questions_labeled.append(LabeledSentence(questions2[i].split(), df[df.index == i].qid2))
if i % 10000 == 0:
progress = i/len(questions1) * 100
print("{}% complete".format(round(progress, 2)))```
CodePudding user response:
Because list has no split() only string objects have split.
CodePudding user response:
The questions1
and questions2
objects seem to hold lists of strings (e.g., questions1 = [['this is a sample text', 'this is another one'],['this is some other text],...]
), and not just strings (e.g., questions1 = ['this is a sample text', 'this is another one',...]
). Hence the error (i.e., 'list' object has no attribute 'split'
), as you are trying to split a list instead of a string. One way to solve this is to create a flast list out of each list of lists, before iterating over them, as described here. For example:
questions1 = [item for sublist in questions1 for item in sublist]
questions2 = [item for sublist in questions2 for item in sublist]