I'm using python's re
library to do this, but it's a basic regex question.
I am receiving a string of coordinate information in degrees-minutes-seconds format without spaces, and I'm parsing it out to discrete coordinate pairs for conversion.
The string is fed to me looking like this (fake coords for example):
102030N0102030E203040N0203040E304050N0304050E405060N0405060E
I am catching it like this:
coordstr = '102030N0102030E203040N0203040E304050N0304050E405060N0405060E'
coords = re.match(
re.compile(r"^(\d [NS]{1}\d [EW]{1})(\d [NS]{1}\d [EW]{1})(\d [NS]{1}\d [EW]{1})(\d [NS]{1}\d [EW]{1})"),
coordstr)
for x in coords.groups():
print(x)
which gives me
102030N0102030E
203040N0203040E
304050N0304050E
405060N0405060E
And allows me to address each coordinate pair as coords.group(1)
, coords.group(2)
and so on.
So it works, but it feels like I'm being too verbose in the pattern. Is there a more succinct way to crawl the line with one of the capture groups, and add each matched group to .groups()
as it's encountered? I know I could do it with brute force string slicing but that seems like more trouble than it's worth.
I've read this but it doesn't seem to address what I'm going after in this question.
Because this is for an enterprise and these strings describe raster bounds, I will be validating the string before introducing the regex search and falling back to a gdal
object if the string is not found (or corrupted).
CodePudding user response:
Since you will pre-validate the strings you will process with regex, you need not use re.search
/ re.match
with several groups with identical pattern, you can use re.findall
to get all \d [NS]\d [EW]
pattern matches from your strings:
import re
coordstr = '102030N0102030E203040N0203040E304050N0304050E405060N0405060E'
coords = re.findall(r'\d [NS]\d [EW]', coordstr)
for x in coords:
print(x)
Output:
102030N0102030E
203040N0203040E
304050N0304050E
405060N0405060E
See the Python demo.
NOTE: the list of matches returned by re.findall will always be in the same order as they are in the source text, see this SO post.