Home > front end >  Regex to extract text between two character patterns
Regex to extract text between two character patterns

Time:11-02

I have multiple rows of data that look like the following:

dgov-nonprod-adp-personal.groups

dgov-prod-gcp-sensitive.groups

I want to get the text between the last hyphen and before the period so:

personal

sensitive

I have this regex (?:prod-(.*)-)(.*).groups however it gives two groups and in bigquery I can only extract if there is one group, what would the regex be to just extract the text i want?

Note: after the second hyphen and before the third it will always be prod or nonprod, that's why in my original regex i use prod- since that will be a constant

CodePudding user response:

Assuming the BigQuery function you are using supports a capture group, I would phrase your requirement as:

([^-] )\.groups$

Demo

CodePudding user response:

For the example data, you can make the pattern a bit more specific matching -nonprod or -prod with a single capture group:

-(?:non)?prod-[^-] -([^-] )\.groups$

See a regex demo.


If there can be more occurrences of the hyphen:

-(?:non)?prod(?:-[^-] )*-([^-] )\.groups$

The pattern matches

  • -(?:non)?prod Match either -nonprod or -prod
  • (?:-[^-] )* Optionally match - followed by 1 chars other than -
  • - Match literally
  • ([^-] ) Capture group 1, match 1 chars other than -
  • \.groups Match .groups
  • $ End of string

See another regex demo.

  • Related