I have a .txt file in the below format. How to load this into a dictionary using pyspark
{'event1': {'nPrev': 4,
'adj': array([ 10, 2, 30]),
'multiply': False,
'closing_raw': 50,
'closing_adj': 60},
'event2': {'nPrev': 4,
'adj': array([ 40, 50, 60]),
'multiply': False},
'event3': {'nPrev': 4,
'adj': array([ 30, 10, 30]),
'multiply': False},
'event4': {'nPrev': 3,
'adj': array([ 20, 10, 30]),
'multiply': False}}
CodePudding user response:
Using eval
directly can be dangerous. Here's how you can get rid of the array
calls and use the safer and stricter ast.literal_eval
:
text = """
{'event1': {'nPrev': 4,
'adj': array([ 10, 2, 30]),
'multiply': False,
'closing_raw': 50,
'closing_adj': 60},
'event2': {'nPrev': 4,
'adj': array([ 40, 50, 60]),
'multiply': False},
'event3': {'nPrev': 4,
'adj': array([ 30, 10, 30]),
'multiply': False},
'event4': {'nPrev': 3,
'adj': array([ 20, 10, 30]),
'multiply': False}}
"""
import re
import ast
without_arrays = re.sub(r"array\((. ?)\)", r"\1", text)
parsed = ast.literal_eval(without_arrays)
If there's a chance of array(...)
appearing inside a string where it shouldn't be removed, then I can show a more robust method using ast
to only remove the calls.
CodePudding user response:
You can use eval
to construct the dictionary. The only problem is that array
is not a python syntax and you need to modify it first.
You can do the following:
txt="""{'event1': {'nPrev': 4,
'adj': array([ 10, 2, 30]),
'multiply': False,
'closing_raw': 50,
'closing_adj': 60},
'event2': {'nPrev': 4,
'adj': [ 40, 50, 60],
'multiply': False},
'event3': {'nPrev': 4,
'adj': array([ 30, 10, 30]),
'multiply': False},
'event4': {'nPrev': 3,
'adj': array([ 20, 10, 30]),
'multiply': False}}"""
from numpy import array
my_dict = eval(txt)