Home > Mobile >  Regexp_extract from URLs as strings (SQL BigQuery)
Regexp_extract from URLs as strings (SQL BigQuery)

Time:02-20

I'm trying to extract a string from multiple URLs that all have one thing in common even though they are built differently. Let me give you a few examples:

/cz/category/79478/productname
/https://www.store.net/de/category/49448/productname
/https://www.store.net/category/62448/productname
/category/79455/productname

I'm using BigQuery and I'm able to write a Regexp_extract clause for individual examples, however, I'm looking for one way of extracting the number (as string) after category/, (79478 from the first url). All the addresses have /category/ part in common so it should be doable from my point of view.

Here's the expression that I've been trying to use:

regexp_extract(page_path, '[^category/] /([^/] )/')

But it doesn't work. Any idea what I'm doing wrong here?

CodePudding user response:

Use a noncapture group for the leading /category/?

regexp_extract(page_path, '(?:/category/)([^/] )')

Demo: enter image description here

  • Related