I'm learning a bit of python and I'm doing the python workbook exercises . Right now I'm stuck on one called Tokenizing a String . I'm sure you know what that means . In my case the string must be a math equation and my code must tokenize it . here is my code :
def tokenizer(x):
x=x.replace(" ","")
list = []
j=0
l=len(x)
temp=""
while j < len(x):
if x[j] == "*" or x[j] == "/" or x[j] == " " or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
list.append(x[j])
j=j 1
while x[j]>="0" and x[j]<="9":
temp = temp x[j]
while j<len(x):
j=j 1
if temp!="":
list.append(temp)
temp=""
return list
def main():
x=input("Enter math expression: ")
list=tokenizer(x)
print("the tokens are: ",list)
if __name__ == '__main__':
main()
So the problem is I can't find a solution where it is not running out of range . It all comes from that "while" loop . I tried the solution from the book , which was quite similar to my one , but it gives the same result . How can I avoid running out of range when I'm using while and adding to counter "j" in my case?
Thanks !!!
CodePudding user response:
The problem is you are adding 1 to j
in this block:
while j < len(x):
if x[j] == "*" or x[j] == "/" or x[j] == " " or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
list.append(x[j])
j=j 1
while x[j]>="0" and x[j]<="9":
temp = temp x[j]
while j<len(x):
j=j 1
if temp!="":
list.append(temp)
temp=""
Let's say j = len(x)-1
and the if statement evaluates to be True
. This will execute the j=j 1
statement.
Now when it enters the while loop
, it checks whether x[j]>="0"
but x[j]
= x[len(x)]
. Since we know that indexing starts at zero, for an array like
a = "abcd"
len(a) = 4
but a[4]
does not exist(last element is 3rd one) causing an IndexError
.
Code with corrections:
def tokenizer(x):
x=x.replace(" ","")
list = []
j=0
l=len(x)
temp=""
while j < len(x):
if x[j] == "*" or x[j] == "/" or x[j] == " " or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
list.append(x[j])
print(x[j])
print(list)
j=j 1
else: # Error 1: You code needs to execute this only if
# the above condition fails
j = j
while j<len(x) and x[j].isnumeric(): # 2: You need to check both
# if the current character
# is an integer and if
# the index is out of range
temp = temp x[j]
# while j<len(x)-1: No need for this statement
j=j 1
if temp!="":
list.append(temp)
temp=""
return list
def main():
x=input("Enter math expression: ")
list=tokenizer(x)
print("the tokens are: ",list)
if __name__ == '__main__':
main()
CodePudding user response:
I have no idea why but this works :
def tokenizer(x):
x=x.replace(" ","")
list = []
j=0
l=len(x)
temp=""
while j < len(x):
if x[j] == "*" or x[j] == "/" or x[j] == " " or x[j] == "-" or x[j] == "^" or x[j] == "(" or x[j] == ")":
list.append(x[j])
j=j 1
while j<len(x) and x[j].isnumeric():
temp=temp x[j]
j=j 1
if temp!="":
list.append(temp)
temp=""
return list
def main():
x=input("Enter math expression: ")
list=tokenizer(x)
print("the tokens are: ",list)
if __name__ == '__main__':
main()
I change the " x[j]>="0" and x[j]<="9"" statement with .isnumeric() and for some weird reason it now works . For me both conditions are identical . Can anyone explain why this works ? I really want to learn how to overcome cases like that in future without loosing my sanity !!!
Thanks