Home > Software design >  Capturing text within the parentheses using Regex
Capturing text within the parentheses using Regex

Time:10-07

I have text, which happens to be code, in which I wish to translate certain known (i.e. hard-coded) function names. The translated functions sometimes have an extra argument, so I cannot simply translate the name, I need to also know what the arguments in parentheses are.

Example of original string (an SQL example):

select appx_median(num) as median from test;
select appx_median(cast(num as int)) as median from test;

Translation I would like:

select percentile_approx(num, 0.5) as median from test;
select percentile_approx(cast(num as int), 0.5) as median from test;

The key thing about this last line is that, as you can see, there needs to be some understanding of the hierarchical / recursive nature of the parentheses being used.

I am trying to use a regex to achieve this (in Scala, though this does not really matter I guess) but am having problems due to the recursive parentheses. In the above example, I know how to translate the first line by using the regex /(appx_median)\(([^\)]*)\)/g, but this does not work for the second. Here is a fiddle which shows the first (successful) and second (failed) translations: https://regexr.com/6718e

*** EDIT 1 *** Here are some more examples which seem to break the first proposed solutions:

//nesting translatable functions
select appx_median(appx_median(cast(num as int))) as median from test;
//beautifying
select appx_median(
    cast(num as int)
) as median from test;
//beautifying and nesting
select appx_median(
    appx_median(
        cast(num as int)
    )
) as median from test;

*** EDIT 2 *** It is apparently not possible to deal with these "nested translations" using regex only. I think that the answer proposed is acceptable for many cases if people are aware of this caveat.

CodePudding user response:

The pattern capture function_name follow by ( , and then capture the same number of ( and ) surrounded by any number of any other character [^()] before reach the end of function and capture ).

appx_median(\(([^()]*([^()]*\([^()]*\)[^()]*)*)\))

Replace with new_function_name ( all the arguments captured by $2 any extra argument and ):

percentile_approx($2,0.5)

Check demo Here

  • Related