I tried to make an lexer in python with replit because I wanted to try making my own programming language. When I tested it out with the half of the lexer working, I got this error:
Traceback (most recent call last):
File "main.py", line 77, in <module>
out, error = Lexer(data)
TypeError: cannot unpack non-iterable Lexer object
I've never seen this error before and I don't understand it.
This is my code:
TT_NUM = "Number"
TT_ADD = "Add"
TT_SUB = "Subtract"
TT_MUL = "Multiply"
TT_DIV = "Divide"
TT_STR = "String"
TT_DOT = "Dot"
TT_WS = "Whitespace"
TT_NL = "New Line"
TT_COLON = "Colon"
strs = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_"
nums = "1234567890"
add = " "
sub = "-"
mul = "*"
div = "/"
dot = "."
ws = " "
nl = "\n"
class Lexer:
def __init__(self, data):
self.data = data
self.pos = 1
self.column = 1
self.row = 1
self.tokens = []
self.currentchar = ""
self.tokenize()
def tokenize(self):
result = ""
token = ""
out = []
while self.pos <= len(self.data):
self.currentchar = self.data[self.pos|self.pos]
if self.currentchar in nums:
result = self.currentchar
if result[1|1] not in strs:
token = TT_NUM
self.pos = 1
self.column = 1
elif self.currentchar == add:
result = self.currentchar
token = TT_ADD
self.pos = 1
self.column = 1
elif self.currentchar == sub:
result = self.currentchar
token = TT_SUB
self.pos = 1
self.column = 1
elif self.currentchar == mul:
result = self.currentchar
token = TT_MUL
self.pos = 1
self.column = 1
elif self.currentchar == div:
result = self.currentchar
token = TT_DIV
self.pos = 1
self.column = 1
elif self.currentchar == dot:
out.append(token)
out.append(result)
result = self.currentchar
token = TT_DOT
self.pos = 1
self.column = 1
else:
return None, "Invalid Character '" self.currentchar "' at line " str(self.row) ", column " str(self.column)
if self.pos > 1:
return out, None
data = "test"
out, error = Lexer(data)
if error:
print("Traceback (most recent call last):\n" error)
What happend that there is that error? How can I fix this?
CodePudding user response:
You called a ctor in this way:
out, error = Lexer(data)
This is a perfectly nice idiom used in Lua, Go, and also when calling python methods such as:
out, error = MyLexer(data).parse()
But as written, you just called the constructor, not a method.
With a method like .parse()
you could make up a return
signature any way you like, including return out, error
2-tuple which would unpack nicely.
In python, the __init__()
result you get from a call like Lexer( ... )
will typically be a single object, not a tuple.
Perhaps you would like to make your lexer conform to the iterable protocol?
You would need to implement methods like __iter__()
and __next__()
for that to work properly.
The OP code is definitely not attempting to implement
the iterable protocol.
Passing around an error status can be a good approach.
An alternate API design to consider,
one that is more pythonic,
would be to raise
ValueError or an app-specific error
upon encountering parse fail.
EDIT
On closer examination, it appears you came very close to implementing the design you had in mind. The tokenizer returns two items. You just never called that method.
out, error = Lexer(data).tokenizer()
print(out, error)
Also, rather than returning a giant list,
consider yield
ing items one at a time,
with raise ValueError()
for an unrecognized character.
For TT_COLON and similar, consider changing them
from str
to Enum
.
Then you'll have an opportunity to tack other attributes
onto them, such as valid characters: '*', ':', 'a..z', maybe re.compile(r'[a-z]').