Find specific char inside delimiter-CodePudding

I have this string:

(40.959953710949506, -74.18210638344726),(40.95891663745299, -74.10606039345703),(40.917472246121065, -74.09582940498359),(40.921752754230255, -74.16397897163398),(40.95248644043785, -74.21067086616523)

I need to grab the commas inside the parentesis for further processing, and I want the commas spliting the groups to remain.

Let's say I want to replace the target commas by FOO, the result should be:

(40.959953710949506 FOO -74.18210638344726),(40.95891663745299 FOO -74.10606039345703),(40.917472246121065 FOO -74.09582940498359),(40.921752754230255 FOO -74.16397897163398),(40.95248644043785 FOO -74.21067086616523)

I want the Regular Expression, I don't want some language specific functions for this.

CodePudding user response：

You can just use a lookaround to find all , that are not preceded by a ) like this:

(?<!\)),

I don't want some language specific functions for this

The format of the above regex is not language specific as can be seen in the following Code Snippet or this regex101 snippet:

const x = '(40.959953710949506, -74.18210638344726),(40.95891663745299, -74.10606039345703),(40.917472246121065, -74.09582940498359),(40.921752754230255, -74.16397897163398),(40.95248644043785, -74.21067086616523)';

const rgx = /(?<!\)),/g;

console.log(x.replace(rgx, ' XXX'));

CodePudding user response：

For example:

import re

s = "(40.959953710949506, -74.18210638344726),(40.95891663745299, -74.10606039345703),(40.917472246121065, -74.09582940498359),(40.921752754230255, -74.16397897163398),(40.95248644043785, -74.21067086616523)"
s = re.sub(r",(?=[^()] \))", " FOO", s)
print(s)

# (40.959953710949506 FOO -74.18210638344726),(40.95891663745299 FOO -74.10606039345703),(40.917472246121065 FOO -74.09582940498359),(40.921752754230255 FOO -74.16397897163398),(40.95248644043785 FOO -74.21067086616523)

We use a positive lookahead to only replace commas where ) comes before ( ahead in the string.

CodePudding user response：

Use re.sub with a callback function:

inp = "(40.959953710949506, -74.18210638344726),(40.95891663745299, -74.10606039345703),(40.917472246121065, -74.09582940498359),(40.921752754230255, -74.16397897163398),(40.95248644043785, -74.21067086616523)"
output = re.sub(r'\((-?\d (?:\.\d )?),\s*(-?\d (?:\.\d )?)\)', lambda m: r'('   m.group(1)   r' FOO '   m.group(2)   r')', inp)
print(output)

This prints:

(40.959953710949506 FOO -74.18210638344726),(40.95891663745299 FOO -74.10606039345703),(40.917472246121065 FOO -74.09582940498359),(40.921752754230255 FOO -74.16397897163398),(40.95248644043785 FOO -74.21067086616523)

The strategy here is to capture the two numbers in each tuple in separate groups. Then, we replace by connecting the two numbers with FOO instead of the original comma.