I need to create a dictionary from a text file that contains coordinates for named polygons. The output needs to be a dictionary where the polygon name is the key and corresponding x and y coordinates are the values. Most of the entries in the file follow a standard layout as follows:
Name of polygon
(12.345, 1.2567)
(5.6789, 2.9876)
(9.0345, 3.7654)
(3.4556, 2.3445)
Name of next polygon
(x, y values)
However there are some entries that have irregularities such as all the values are on one line or have extra characters between the parentheses. I need to loop over the values and split the values contained in parentheses.
So far I have created the dictionary in an initial pass over the file and am trying to use regex to split the values based on contents of parentheses:
with open(fpath, 'r') as infile:
d = {}
#split the data into keys and values
for group in infile.read().split('\n\n'):
entry = group.split('\n')
key, *val = entry
d[key] = val
for value in d.values():
value = re.split("*[\(. $\)]*", str(value))
print(d)
I was hoping that this would clean up the values and create individual values for each set of coordinates contained in the parentheses, however I am getting the following error:
re.error: nothing to repeat at position 0
CodePudding user response:
I think I've found a solution to my problem. I needed to account for multiple values per key in the loop and use re.findall()
instead of re.split()
. So my final loop looks like:
for key, *value in d.items():
d[key] = re.findall("\(. \)", str(value))