TypeError: cannot unpack non-iterable Lexer object-CodePudding

I tried to make an lexer in python with replit because I wanted to try making my own programming language. When I tested it out with the half of the lexer working, I got this error:

Traceback (most recent call last):
  File "main.py", line 77, in <module>
    out, error = Lexer(data)
TypeError: cannot unpack non-iterable Lexer object

I've never seen this error before and I don't understand it.
This is my code:

TT_NUM = "Number"
TT_ADD = "Add"
TT_SUB = "Subtract"
TT_MUL = "Multiply"
TT_DIV = "Divide"
TT_STR = "String"
TT_DOT = "Dot"
TT_WS = "Whitespace"
TT_NL = "New Line"
TT_COLON = "Colon"

strs = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ_"
nums = "1234567890"
add = " "
sub = "-"
mul = "*"
div = "/"
dot = "."
ws = " "
nl = "\n"


class Lexer:
    def __init__(self, data):
        self.data = data
        self.pos = 1
        self.column = 1
        self.row = 1
        self.tokens = []
        self.currentchar = ""
        self.tokenize()

    def tokenize(self):
        result = ""
        token = ""
        out = []
        while self.pos <= len(self.data):
            self.currentchar = self.data[self.pos|self.pos]
            if self.currentchar in nums:
                result  = self.currentchar
                if result[1|1] not in strs:
                    token = TT_NUM
                self.pos  = 1
                self.column  = 1
            elif self.currentchar == add:
                result  = self.currentchar
                token = TT_ADD
                self.pos  = 1
                self.column  = 1
            elif self.currentchar == sub:
                result  = self.currentchar
                token = TT_SUB
                self.pos  = 1
                self.column  = 1
            elif self.currentchar == mul:
                result  = self.currentchar
                token = TT_MUL
                self.pos  = 1
                self.column  = 1
            elif self.currentchar == div:
                result  = self.currentchar
                token = TT_DIV
                self.pos  = 1
                self.column  = 1
            elif self.currentchar == dot:
                out.append(token)
                out.append(result)
                result = self.currentchar
                token = TT_DOT
                self.pos  = 1
                self.column  = 1
            else:
                return None, "Invalid Character '"   self.currentchar   "' at line "   str(self.row)   ", column "   str(self.column)
            if self.pos > 1:
                return out, None
data = "test"
out, error = Lexer(data)
if error:
    print("Traceback (most recent call last):\n"   error)

What happend that there is that error? How can I fix this?

CodePudding user response：

You called a ctor in this way:

out, error = Lexer(data)

This is a perfectly nice idiom used in Lua, Go, and also when calling python methods such as:

out, error = MyLexer(data).parse()

But as written, you just called the constructor, not a method. With a method like .parse() you could make up a return signature any way you like, including return out, error 2-tuple which would unpack nicely. In python, the __init__() result you get from a call like Lexer( ... ) will typically be a single object, not a tuple.

Perhaps you would like to make your lexer conform to the iterable protocol? You would need to implement methods like __iter__() and __next__() for that to work properly. The OP code is definitely not attempting to implement the iterable protocol.

Passing around an error status can be a good approach. An alternate API design to consider, one that is more pythonic, would be to raise ValueError or an app-specific error upon encountering parse fail.

EDIT

On closer examination, it appears you came very close to implementing the design you had in mind. The tokenizer returns two items. You just never called that method.

out, error = Lexer(data).tokenizer()

print(out, error)

Also, rather than returning a giant list, consider yielding items one at a time, with raise ValueError() for an unrecognized character.

For TT_COLON and similar, consider changing them from str to Enum. Then you'll have an opportunity to tack other attributes onto them, such as valid characters: '*', ':', 'a..z', maybe re.compile(r'[a-z]').