Extracting Days and Timings from a String in Bigquery-CodePudding

I am using bigquery to extract a User's Schedule data from a String.

The strings look like this:

{"0,1,2,3,4,5,6":[["11:00","21:30"]]}

-> This means that the User is scheduled to for 7 days a week from 11:00 am till 9:30 pm

{"0,1,2,3,6":[["11:00","21:30"]],"4,5":[["11:00","22:30"]]}

-> This means User will be online from 11:00 am till 9:30 pm on Sat-Wed and 11:00 am till 10:30 pm on Fridays

I want to split this into three columns, Days and Start/End time. So far I have been able to get the desired data for the first example using REGEXP_Extract Function.

Days	Start Time	End Time
0,1,2,3,4,5,6	11:00	21:30

Not sure how to go about doing it for the second example.

Here is the query I have used.

select 
name,
periods, 
REGEXP_EXTRACT(periods, "{\"(.*)\":.*") as Days,
REGEXP_EXTRACT(periods, "\\[\\[\"(.*)\",") as Start_time,
REGEXP_EXTRACT(periods, ",\"(.*)\"\\]\\]") as  end_time,

from `projectid.dataset.table` where is_active_cache=1 limit 100

CodePudding user response：

You might consider below.

WITH sample_table AS (
  SELECT '{"0,1,2,3,4,5,6":[["11:00","21:30"]]}' schedule UNION ALL
  SELECT '{"0,1,2,3,6":[["11:00","21:30"]],"4,5":[["11:00","22:30"]]}'
)
SELECT days, start_time, end_time
  FROM sample_table, UNNEST (REGEXP_EXTRACT_ALL(schedule, r'"([0-9,]*)":')) days WITH offset
  JOIN UNNEST(REGEXP_EXTRACT_ALL(schedule, r'\[\["(\d{2}:\d{2})"')) start_time WITH offset USING (offset)
  JOIN UNNEST(REGEXP_EXTRACT_ALL(schedule, r'"(\d{2}:\d{2})"]]')) end_time WITH offset USING (offset);

Query results

CodePudding user response：

you can use the following query:

select
  name,
  periods,
  REGEXP_EXTRACT(periods, "{\"([^\"]*)\":.*") as Days,
  REGEXP_EXTRACT(periods, "\\[\\[\"(.*)\",") as Start_time,
  REGEXP_EXTRACT(periods, ",\"(.*)\"\\]\\]") as End_time,
from `projectid.dataset.table`
where is_active_cache = 1
  and REGEXP_CONTAINS(periods, "{\"[^\"]*\":[\\[\\[\"[^\"]*\",\"[^\"]*\"\\]\\]]}")
limit 100

where "{\"([^\"]*)\":.*" matches the string that starts with {", followed by a group of characters that does not contain ", followed by ":, and then any number of characters until the end of the string.

CodePudding user response：

Another approach with less cluttered code

select null as days, null as start_time, null as end_time from unnest([1]) where false 
union all
select * except(schedule, pos) from (
  select schedule, part, div(offset, 3) pos, mod(offset, 3) as col
  from your_table, unnest(regexp_extract_all(schedule, r'"([^"]*?)"')) part with offset
)
pivot (any_value(part) for col in (0,1,2))

if applied to sample data in your question - output is