I have created a regular expression to extract geographical coordinates (decimal format) data from a specific pattern of strings.
Regex: /(?<lat>\-?\d \.\d )(?<comma>\,)(?<long>\-?\d \.\d )/gm
sample of strings that are working: 11.18531,88.78292
,8.34329,123.48655
, 21.11912,145.85430
, 36.29781,-13.06121
, -16.35564,-36.07065
, 31.86691,122.28366
, 38.84814,-140.10204
, 18.58158,59.20813
Here I want to extract the latitude and longitude values with their -
signs.
But there are some patterns of string which I want to reject completely: -76.84933,-141.64907hel
, no-16.3993,-11.2359
, -77.7678er,161.9786
, 69.8149,-l61.9041
, 5.9333,10.1rr667
, jkl11.18531,88.78292hh
But few of the above strings are passing the regex.test()
as true
: -76.84933,-141.64907hel
, no-16.3993,-11.2359
, 5.9333,10.1rr667
, jkl11.18531,88.78292hh
. For these types of strings, I want to return false
. No match should be found for these strings like these. These are very much contaminated and don't want to work on them further.
Visit https://regex101.com/r/rT6YvD/1 to see the regex playground I am using.
This is what I am not able to achieve. Please help!!!
Any suggestion accepted gratefully. Thanks in advance.
CodePudding user response:
You can use
(?<lat>-?\b(?<!\b-)\d \.\d )(?<comma>,)(?<long>-?\d \.\d )\b
See the regex demo. Details:
(?<lat>-?\b(?<!\b-)\d \.\d )
- Group "lat": an optional-
, a word boundary that is not immediately preceded with a-
that has a word char in front of it, and then one or more digits,.
and one or more digits(?<comma>,)
- Group "comma": a comma(?<long>-?\d \.\d )
- Group "long": an optional-
, one or more digits,.
and one or more digits\b
- a word boundary
If you need to make sure the string only contains the cooordinates, you can remove all whitespaces and then check if the whole string matches your coordinates pattern:
const match = text
.replace(/\s /g, '')
.match(/^(?<lat>-?\d \.\d )(?<comma>,)(?<long>-?\d \.\d )$/);
Here, ^
matches the start of string, and $
matches the end of string.