I'm trying to pull product search colors from urls in my bigquery data, but the urls format changed at one point so I'm trying to pull them from two different formats.
The first one is like /someproduct/color/blue/color/red
and the second one is like /someproduct/?colors=blue,red
.
In both these examples, the person filtered on multiple colors, and I would like it the output of the extraction to be like 'blue,red' and all in one column.
I can pull the first one with:
REGEXP_EXTRACT_all(url,r'color/([A-Za-z] )')
and the second one with:
REGEXP_EXTRACT(url,r'colors=(\w*,\w*)')
The first one produces an array and I don't know how to get all the outputs to show as one line item in one row. A person can filter up to 20 colors, so there can be a lot of repeating within both of the URL types.
The second one also can use some improving as I have to add in an additional \w*,
for each additional color filtered on, and I have a case statement that handles that, but I don't think it's the most efficient way to handle it.
CodePudding user response:
Consider below approach
select url,
coalesce(regexp_extract(url, r'colors=(\w*,\w*)'), array_to_string(regexp_extract_all(url,r'color/([A-Za-z] )'), ',')) colors
from your_table
if applied to sample data in your question - output is