Home > OS >  regexp_replace is replacing my string with multiple periods in between capture groups when I only wa
regexp_replace is replacing my string with multiple periods in between capture groups when I only wa

Time:12-09

Hi I am using regexp_replace in an SQL query that uses AWS Athena in order to change a version number in a user agent from 16_0 to 16.0

Instead of getting 16.0 back, I am getting 16.0.. back? If I remove the . between $1 and $2 I get 160 so this makes me believe my regex is correct but for some reason my replacement string is adding two periods at the end when I only want a single period in the middle.

This is an example of one of the user agents (note I am also parsing iPads too)

Mozilla/5.0 (iPhone; CPU iPhone OS 16_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/108.0.5359.52 Mobile/15E148 Safari/604.1

This is my regex replace statement:

    regexp_replace(
        useragent,
        '.*?(?<=OS )(\d )_(\d{1,2})|.*',
        '$1.$2'
      )

Edited to add: I have tried a variety of regex combinations, such as greedy or no lookbehinds, etc and test on regex101 but to no avail.

AHA! - I believe it is adding two periods because it is getting 2 more matches with a null for capture groups and adding the period per each. I need to figure out how to stop it. Relatively new to regex

CodePudding user response:

The problem comes from the fact you are not matching the rest of the string after the version in the first alternative, and the second .* alternative matches the rest of the string, and then the end of string position.

You can use this regex

.*?OS (\d )_(\d{1,2}).*

to capture the version parts and consume the whole string. Note that this pattern has no alternative to match the whole string if there is no version in the string because in that case you will get a . in the output ($1 and $2 will be empty if the version is missing). If you still need that behavior, replace .* with . (see demo):

.*?OS (\d )_(\d{1,2}).*|. 

to match at least one char other than line break chars, as many as possible.

  • Related