Here is the code. Can anyone spot the error?
file = open("WSJ_02-21.pos-chunk", 'r')
lines = file.readlines()
input_list = [[0 for j in range(5)] for i in range(len(lines))]
for i in range(len(input_list)):
input_line = lines[i].split("\t")
if len(input_line) == 0:
for j in range(len(input_list[i])):
input_list[i][j] = ""
elif len(input_line) == 3:
for j in range(len(input_list[i])):
input_list[i][j] = input_line[i][j]
Here is the error
Traceback (most recent call last):
File "C:/Users/inigo/PycharmProjects/NLPHW5/main.py", line 12, in <module>
input_list[i][j] = input_line[i][j]
IndexError: string index out of range
My expected output is a 2 dimensional list with the elements WSJ_02-21.pos-chunk
link to the input file [https://drive.google.com/file/d/1QLMfD9HhvshhqE7XqIn96ML-M0j2uNLh/view?usp=sharing]
CodePudding user response:
The purpose of the code isn't completely clear, but if I understand it correctly the following code seems to be what you are trying to achieve:
with open("WSJ_02-21.pos-chunk", 'r') as f:
input_list = []
for line in f:
input_line = line.strip().split('\t')
if len(input_line) == 0:
input_list.append([''])
elif len(input_line) == 3:
input_list.append(input_line)
But -- do you really want to have entries for blank lines?
If not, the following might be even better:
with open("WSJ_02-21.pos-chunk", 'r') as f:
input_list = []
for line in f:
input_line = line.strip()
if len(input_line) > 0:
input_list.append(input_line.split('\t'))
CodePudding user response:
If the line you pass is like:
lines = ["avd\tbdc\tcdc"]
Your input_line
will have 3 tokens (hence will end up in the elif
) but your input_list[i]
will be longer than 5 (the default length you imposed in each line of input_list
) and you'll end up out of range
input_list[i][j] = input_line[i][j]
IndexError: string index out of range