Good morning,
I was working on a regex to validate and extract some iata codes from an itinerary to create a departure and arrival column
I have for example this input:
123424-LYS-OPO-PAR
LYS-OPO
OPO-PAR
OPO-PAR-LYS
LYS-OPO-PAR-...-BKK
12312312-312-231-231
To get this:
departure | arrival |
---|---|
LYS | PAR |
LYS | OPO |
OPO | PAR |
OPO | LYS |
LYS | BKK |
Nothing | Nothing |
To get this in BQ. I was trying:
select
case
when regexp_contains(iata.itinerary, r'^\W-|[A-Z]{3}-[A-Z]{3}')
then split(regexp_substr(iata.itinerary), r'^\W|[A-Z]{3}-[A-Z]{3}'), '-')[offset(0)]
else 'Nothing'
end as departure,
case
when regexp_contains(iata.itinerary, r'^\W-|[A-Z]{3}-[A-Z]{3}')
then ARRAY_REVERSE(split(regexp_substr(upper(iata.itinerary), r'^\W|[A-Z]{3}-[A-Z]{3}'), '-'))[safe_offset(0)]
else 'Nothing'
end as arrival
from iata_data iata
But I dont know how to deal with more than X number of iata codes in BQ
Thanks in advance
CodePudding user response:
You might consider below.
WITH iata_data AS (
SELECT '123424-LYS-OPO-PAR' itinerary UNION ALL
SELECT 'LYS-OPO' UNION ALL
SELECT 'OPO-PAR' UNION ALL
SELECT 'OPO-PAR-LYS' UNION ALL
SELECT 'LYS-OPO-PAR-...-BKK' UNION ALL
SELECT '12312312-312-231-231'
)
SELECT COALESCE(REGEXP_EXTRACT_ALL(iata.itinerary, '[A-Z]{3}')[SAFE_OFFSET(0)], 'Nothing') departure,
COALESCE(ARRAY_REVERSE(REGEXP_EXTRACT_ALL(iata.itinerary, '[A-Z]{3}'))[SAFE_OFFSET(0)], 'Nothing') arrival
FROM iata_data iata;
or,
SELECT COALESCE(codes[SAFE_OFFSET(0)], 'Nothing') departure,
COALESCE(codes[SAFE_OFFSET(ARRAY_LENGTH(codes) - 1)], 'Nothing') arrival
FROM iata_data iata, UNNEST([STRUCT(REGEXP_EXTRACT_ALL(iata.itinerary, '[A-Z]{3}') AS codes)]);
Query results