Reading tuple data from file with varying types-CodePudding

everybody. I'm trying to read in this data set and then convert it to a traversable tuple in python.

(
 (
  (0, 1),
  (q0, q1, q2),
  q0,
  (q2),
  ((q0, 0, q0), (q0, 1, q0),(q0, 0, q1),(q1, 1, q2))
 ), 
 ()
)

The issue seems to be that the datatypes cross and it can't just be scrubbed and read in.

Thanks for any help yall can give!

Edit: Current, albeit small, non-working code:

filename = "dataset1.txt"


with open(filename, 'r') as dataset:
    data = ""
    for line in dataset:
        data  = line.strip().replace(' ', '')
    print(data)

    print(tuple(data))

which generates the output

('(', '(', '(', '0', ',', '1', ')', ',', '(', 'q', '0', ',', 'q', '1', ',', 'q', '2', ')', ',', 'q', '0', ',', '(', 'q', '2', ')', ',', '(', '(', 'q', '0', ',', '0', ',', 'q', '0', ')', ',', '(', 'q', '0', ',', '1', ',', 'q', '0', ')', ',', '(', 'q', '0', ',', '0', ',', 'q', '1', ')', ',', '(', 'q', '1', ',', '1', ',', 'q', '2', ')', ')', ')', ')', ',', '(', ')', ')')

CodePudding user response：

Python has a built-in abstract syntax tree library, ast, that allows you to walk though python code. The challenge here is that you have string like q0 that python interprets as names. This is easily handled by replacing all of those with strings, then evaluating the tuple:

import ast

s = '''(
 (
  (0, 1),
  (q0, q1, q2),
  q0,
  (q2),
  ((q0, 0, q0), (q0, 1, q0),(q0, 0, q1),(q1, 1, q2))
 ), 
 ()
)'''

class Names2Strings(ast.NodeTransformer):
    def visit_Name(self, node):
        return ast.copy_location(ast.Str(s=node.id), node)

tree = ast.parse(s, mode='eval')        
data = eval(compile(Names2Strings().visit(tree), filename="<string>", mode="eval"))

data will be the tuple structure you want:

(((0, 1),
  ('q0', 'q1', 'q2'),
  'q0',
  'q2',
  (('q0', 0, 'q0'), ('q0', 1, 'q0'), ('q0', 0, 'q1'), ('q1', 1, 'q2'))),
 ())

note: in python (q2) is not a tuple. It's just a single value and the output reflects that.

CodePudding user response：

You have to build a custom parser here.

Here is a possible code that will parse an iterator giving one character at a time:

def tup_parse(it, top=True):
    def gettok(toklist):
        token = ''.join(toklist)
        try:
            token = int(token)
        except ValueError:
            pass
        toklist.clear()
        return token
    data = []
    toklist= []
    lastisparen = False
    try:
        while True:
            c = next(it)
            if c == '(':
                data.append(tup_parse(it, False))
                c = ')'
            elif c == ')':
                if len(toklist) > 0:
                    data.append(gettok(toklist))
                return tuple(data)
            elif c == ',':
                if not lastisparen:
                    data.append(gettok(toklist))
            elif str.isspace(c):
                pass
            else:
                toklist.append(c)
            lastisparen = (c == ')')
    except StopIteration:
        if not top:
            raise
        if len(toklist) > 0:
            data.append(gettok(toklist))
        return tuple(data)

If you want to parse a string (say s), you can use it directly:

tup = tup_parse(iter(s))

If you want to parse a file, you can first build a character generator:

def genit(fd):
    while True:
        c = fd.read(1)
        if c != '':
            yield c
        else:
            return

and then use:

tup = tup_parse(genit(fd))