Home > Mobile >  How to deal with "String index out of range " in Python
How to deal with "String index out of range " in Python

Time:04-03

I'm learning a bit of python and I'm doing the python workbook exercises . Right now I'm stuck on one called Tokenizing a String . I'm sure you know what that means . In my case the string must be a math equation and my code must tokenize it . here is my code :

def tokenizer(x):
    x=x.replace(" ","")
    list = []
    j=0
    l=len(x)
    temp=""
    while j < len(x):
        if x[j] == "*" or x[j] == "/" or x[j] == " " or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
            list.append(x[j])
            j=j 1
        while x[j]>="0" and x[j]<="9":
            temp = temp   x[j]
            while j<len(x):
                j=j 1
        if temp!="":
            list.append(temp)
            temp=""
    return list

def main():
    x=input("Enter math expression: ")
    list=tokenizer(x)
    print("the tokens are: ",list)

if __name__ == '__main__':
    main()

So the problem is I can't find a solution where it is not running out of range . It all comes from that "while" loop . I tried the solution from the book , which was quite similar to my one , but it gives the same result . How can I avoid running out of range when I'm using while and adding to counter "j" in my case?

Thanks !!!

CodePudding user response:

The problem is you are adding 1 to j in this block:

while j < len(x):
    if x[j] == "*" or x[j] == "/" or x[j] == " " or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
        list.append(x[j])
        j=j 1
    while x[j]>="0" and x[j]<="9":
        temp = temp   x[j]
        while j<len(x):
            j=j 1
    if temp!="":
        list.append(temp)
        temp=""

Let's say j = len(x)-1 and the if statement evaluates to be True. This will execute the j=j 1 statement. Now when it enters the while loop, it checks whether x[j]>="0" but x[j] = x[len(x)]. Since we know that indexing starts at zero, for an array like

a = "abcd"

len(a) = 4 but a[4] does not exist(last element is 3rd one) causing an IndexError.

Code with corrections:

def tokenizer(x):
    x=x.replace(" ","")
    list = []
    j=0
    l=len(x)
    temp=""
    while j < len(x):
        if x[j] == "*" or x[j] == "/" or x[j] == " " or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
            list.append(x[j])
            print(x[j])
            print(list)
            j=j 1
        else: # Error 1: You code needs to execute this only if
              # the above condition fails
            j = j
            while j<len(x) and x[j].isnumeric(): # 2: You need to check both
                                                 # if the current character 
                                                 # is an integer and if 
                                                 # the index is out of range
                
                temp = temp   x[j]
                # while j<len(x)-1: No need for this statement
                j=j 1
            if temp!="":
                list.append(temp)
                temp=""
            
    return list

def main():
    x=input("Enter math expression: ")
    list=tokenizer(x)
    print("the tokens are: ",list)

if __name__ == '__main__':
    main()

CodePudding user response:

I have no idea why but this works :

def tokenizer(x):
    x=x.replace(" ","")
    list = []
    j=0
    l=len(x)
    temp=""
    while j < len(x):
        if x[j] == "*" or x[j] == "/" or x[j] == " " or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
            list.append(x[j])
            j=j 1
        while j<len(x) and x[j].isnumeric():
                temp=temp x[j]
                j=j 1
        if temp!="":
            list.append(temp)
            temp=""

    return list

def main():
    x=input("Enter math expression: ")
    list=tokenizer(x)
    print("the tokens are: ",list)

if __name__ == '__main__':
    main()

I change the " x[j]>="0" and x[j]<="9"" statement with .isnumeric() and for some weird reason it now works . For me both conditions are identical . Can anyone explain why this works ? I really want to learn how to overcome cases like that in future without loosing my sanity !!!

Thanks

  • Related