everybody. I'm trying to read in this data set and then convert it to a traversable tuple in python.
(
(
(0, 1),
(q0, q1, q2),
q0,
(q2),
((q0, 0, q0), (q0, 1, q0),(q0, 0, q1),(q1, 1, q2))
),
()
)
The issue seems to be that the datatypes cross and it can't just be scrubbed and read in.
Thanks for any help yall can give!
Edit: Current, albeit small, non-working code:
filename = "dataset1.txt"
with open(filename, 'r') as dataset:
data = ""
for line in dataset:
data = line.strip().replace(' ', '')
print(data)
print(tuple(data))
which generates the output
('(', '(', '(', '0', ',', '1', ')', ',', '(', 'q', '0', ',', 'q', '1', ',', 'q', '2', ')', ',', 'q', '0', ',', '(', 'q', '2', ')', ',', '(', '(', 'q', '0', ',', '0', ',', 'q', '0', ')', ',', '(', 'q', '0', ',', '1', ',', 'q', '0', ')', ',', '(', 'q', '0', ',', '0', ',', 'q', '1', ')', ',', '(', 'q', '1', ',', '1', ',', 'q', '2', ')', ')', ')', ')', ',', '(', ')', ')')
CodePudding user response:
Python has a built-in abstract syntax tree library, ast, that allows you to walk though python code. The challenge here is that you have string like q0
that python interprets as names. This is easily handled by replacing all of those with strings, then evaluating the tuple:
import ast
s = '''(
(
(0, 1),
(q0, q1, q2),
q0,
(q2),
((q0, 0, q0), (q0, 1, q0),(q0, 0, q1),(q1, 1, q2))
),
()
)'''
class Names2Strings(ast.NodeTransformer):
def visit_Name(self, node):
return ast.copy_location(ast.Str(s=node.id), node)
tree = ast.parse(s, mode='eval')
data = eval(compile(Names2Strings().visit(tree), filename="<string>", mode="eval"))
data
will be the tuple structure you want:
(((0, 1),
('q0', 'q1', 'q2'),
'q0',
'q2',
(('q0', 0, 'q0'), ('q0', 1, 'q0'), ('q0', 0, 'q1'), ('q1', 1, 'q2'))),
())
note: in python (q2)
is not a tuple. It's just a single value and the output reflects that.
CodePudding user response:
You have to build a custom parser here.
Here is a possible code that will parse an iterator giving one character at a time:
def tup_parse(it, top=True):
def gettok(toklist):
token = ''.join(toklist)
try:
token = int(token)
except ValueError:
pass
toklist.clear()
return token
data = []
toklist= []
lastisparen = False
try:
while True:
c = next(it)
if c == '(':
data.append(tup_parse(it, False))
c = ')'
elif c == ')':
if len(toklist) > 0:
data.append(gettok(toklist))
return tuple(data)
elif c == ',':
if not lastisparen:
data.append(gettok(toklist))
elif str.isspace(c):
pass
else:
toklist.append(c)
lastisparen = (c == ')')
except StopIteration:
if not top:
raise
if len(toklist) > 0:
data.append(gettok(toklist))
return tuple(data)
If you want to parse a string (say s
), you can use it directly:
tup = tup_parse(iter(s))
If you want to parse a file, you can first build a character generator:
def genit(fd):
while True:
c = fd.read(1)
if c != '':
yield c
else:
return
and then use:
tup = tup_parse(genit(fd))