Home > OS >  Clean up irregularities in dictionary values using regex
Clean up irregularities in dictionary values using regex

Time:11-04

I need to create a dictionary from a text file that contains coordinates for named polygons. The output needs to be a dictionary where the polygon name is the key and corresponding x and y coordinates are the values. Most of the entries in the file follow a standard layout as follows:

Name of polygon
(12.345, 1.2567)
(5.6789, 2.9876)
(9.0345, 3.7654)
(3.4556, 2.3445)

Name of next polygon
(x, y values)

However there are some entries that have irregularities such as all the values are on one line or have extra characters between the parentheses. I need to loop over the values and split the values contained in parentheses.

So far I have created the dictionary in an initial pass over the file and am trying to use regex to split the values based on contents of parentheses:

with open(fpath, 'r') as infile:
     d = {}

     #split the data into keys and values
     for group in infile.read().split('\n\n'):
     entry = group.split('\n')
     key, *val = entry
            
     d[key] = val
     for value in d.values():
         value = re.split("*[\(. $\)]*", str(value))

print(d)

I was hoping that this would clean up the values and create individual values for each set of coordinates contained in the parentheses, however I am getting the following error:

re.error: nothing to repeat at position 0

CodePudding user response:

I think I've found a solution to my problem. I needed to account for multiple values per key in the loop and use re.findall() instead of re.split(). So my final loop looks like:

for key, *value in d.items():
    d[key] = re.findall("\(. \)", str(value))
  • Related