I keep getting the error "String Index out of range" I have made sure that the array has e-CodePudding

Here is the code. Can anyone spot the error?

file = open("WSJ_02-21.pos-chunk", 'r')
lines = file.readlines()
input_list = [[0 for j in range(5)] for i in range(len(lines))]
for i in range(len(input_list)):
    input_line = lines[i].split("\t")
    if len(input_line) == 0:
        for j in range(len(input_list[i])):
            input_list[i][j] = ""
    elif len(input_line) == 3:
        for j in range(len(input_list[i])):
            input_list[i][j] = input_line[i][j]

Here is the error

Traceback (most recent call last):
  File "C:/Users/inigo/PycharmProjects/NLPHW5/main.py", line 12, in <module>
    input_list[i][j] = input_line[i][j]
IndexError: string index out of range

My expected output is a 2 dimensional list with the elements WSJ_02-21.pos-chunk

link to the input file [https://drive.google.com/file/d/1QLMfD9HhvshhqE7XqIn96ML-M0j2uNLh/view?usp=sharing]

CodePudding user response：

The purpose of the code isn't completely clear, but if I understand it correctly the following code seems to be what you are trying to achieve:

with open("WSJ_02-21.pos-chunk", 'r') as f:
    input_list = []
    for line in f:
        input_line = line.strip().split('\t')
        if len(input_line) == 0:
            input_list.append([''])
        elif len(input_line) == 3:
            input_list.append(input_line)

But -- do you really want to have entries for blank lines?

If not, the following might be even better:

with open("WSJ_02-21.pos-chunk", 'r') as f:
    input_list = []
    for line in f:
        input_line = line.strip()
        if len(input_line) > 0:
            input_list.append(input_line.split('\t'))

CodePudding user response：

If the line you pass is like:

lines = ["avd\tbdc\tcdc"]

Your input_line will have 3 tokens (hence will end up in the elif) but your input_list[i] will be longer than 5 (the default length you imposed in each line of input_list) and you'll end up out of range

input_list[i][j] = input_line[i][j]
IndexError: string index out of range