I am using bigquery to extract a User's Schedule data from a String.
The strings look like this:
{"0,1,2,3,4,5,6":[["11:00","21:30"]]}
-> This means that the User is scheduled to for 7 days a week from 11:00 am till 9:30 pm
{"0,1,2,3,6":[["11:00","21:30"]],"4,5":[["11:00","22:30"]]}
-> This means User will be online from 11:00 am till 9:30 pm on Sat-Wed and 11:00 am till 10:30 pm on Fridays
I want to split this into three columns, Days and Start/End time. So far I have been able to get the desired data for the first example using REGEXP_Extract Function.
Days | Start Time | End Time |
---|---|---|
0,1,2,3,4,5,6 | 11:00 | 21:30 |
Not sure how to go about doing it for the second example.
Here is the query I have used.
select
name,
periods,
REGEXP_EXTRACT(periods, "{\"(.*)\":.*") as Days,
REGEXP_EXTRACT(periods, "\\[\\[\"(.*)\",") as Start_time,
REGEXP_EXTRACT(periods, ",\"(.*)\"\\]\\]") as end_time,
from `projectid.dataset.table` where is_active_cache=1 limit 100
CodePudding user response:
You might consider below.
WITH sample_table AS (
SELECT '{"0,1,2,3,4,5,6":[["11:00","21:30"]]}' schedule UNION ALL
SELECT '{"0,1,2,3,6":[["11:00","21:30"]],"4,5":[["11:00","22:30"]]}'
)
SELECT days, start_time, end_time
FROM sample_table, UNNEST (REGEXP_EXTRACT_ALL(schedule, r'"([0-9,]*)":')) days WITH offset
JOIN UNNEST(REGEXP_EXTRACT_ALL(schedule, r'\[\["(\d{2}:\d{2})"')) start_time WITH offset USING (offset)
JOIN UNNEST(REGEXP_EXTRACT_ALL(schedule, r'"(\d{2}:\d{2})"]]')) end_time WITH offset USING (offset);
Query results
CodePudding user response:
you can use the following query:
select
name,
periods,
REGEXP_EXTRACT(periods, "{\"([^\"]*)\":.*") as Days,
REGEXP_EXTRACT(periods, "\\[\\[\"(.*)\",") as Start_time,
REGEXP_EXTRACT(periods, ",\"(.*)\"\\]\\]") as End_time,
from `projectid.dataset.table`
where is_active_cache = 1
and REGEXP_CONTAINS(periods, "{\"[^\"]*\":[\\[\\[\"[^\"]*\",\"[^\"]*\"\\]\\]]}")
limit 100
where "{\"([^\"]*)\":.*"
matches the string that starts with {"
, followed by a group of characters that does not contain "
, followed by ":
, and then any number of characters until the end of the string.
CodePudding user response:
Another approach with less cluttered code
select null as days, null as start_time, null as end_time from unnest([1]) where false
union all
select * except(schedule, pos) from (
select schedule, part, div(offset, 3) pos, mod(offset, 3) as col
from your_table, unnest(regexp_extract_all(schedule, r'"([^"]*?)"')) part with offset
)
pivot (any_value(part) for col in (0,1,2))
if applied to sample data in your question - output is