Home > Software engineering >  Regular Expression (Regex) not working for all samples of geographical coordinates (decimal format)
Regular Expression (Regex) not working for all samples of geographical coordinates (decimal format)

Time:05-05

I have created a regular expression to extract geographical coordinates (decimal format) data from a specific pattern of strings.
Regex: /(?<lat>\-?\d \.\d )(?<comma>\,)(?<long>\-?\d \.\d )/gm

sample of strings that are working: 11.18531,88.78292,8.34329,123.48655, 21.11912,145.85430, 36.29781,-13.06121, -16.35564,-36.07065, 31.86691,122.28366, 38.84814,-140.10204, 18.58158,59.20813

Here I want to extract the latitude and longitude values with their - signs.

But there are some patterns of string which I want to reject completely: -76.84933,-141.64907hel, no-16.3993,-11.2359, -77.7678er,161.9786, 69.8149,-l61.9041, 5.9333,10.1rr667, jkl11.18531,88.78292hh

But few of the above strings are passing the regex.test() as true: -76.84933,-141.64907hel, no-16.3993,-11.2359, 5.9333,10.1rr667, jkl11.18531,88.78292hh. For these types of strings, I want to return false. No match should be found for these strings like these. These are very much contaminated and don't want to work on them further.

Visit https://regex101.com/r/rT6YvD/1 to see the regex playground I am using.

This is what I am not able to achieve. Please help!!!

Any suggestion accepted gratefully. Thanks in advance.

CodePudding user response:

You can use

(?<lat>-?\b(?<!\b-)\d \.\d )(?<comma>,)(?<long>-?\d \.\d )\b

See the regex demo. Details:

  • (?<lat>-?\b(?<!\b-)\d \.\d ) - Group "lat": an optional -, a word boundary that is not immediately preceded with a - that has a word char in front of it, and then one or more digits, . and one or more digits
  • (?<comma>,) - Group "comma": a comma
  • (?<long>-?\d \.\d ) - Group "long": an optional -, one or more digits, . and one or more digits
  • \b - a word boundary

If you need to make sure the string only contains the cooordinates, you can remove all whitespaces and then check if the whole string matches your coordinates pattern:

const match = text
    .replace(/\s /g, '')
    .match(/^(?<lat>-?\d \.\d )(?<comma>,)(?<long>-?\d \.\d )$/);

Here, ^ matches the start of string, and $ matches the end of string.

  • Related