Home > front end >  Extract multiple matching strings from a multiline string in Bigquery
Extract multiple matching strings from a multiline string in Bigquery

Time:10-06

I have two fields, key and a multiline string field called description that looks like this:

Key Description
1 [multiline string as below]
2 [multiline string as below]
3 [multiline string as below]

Example of 'description' field:

Some data: 12345
Random string line
System: string that I needA
Some other line here
Some other number of lines here
Some data: qwerty
Random string line
System: string that I needB
Some other line here
Some other number of lines here
Some data: 67890
Random string line
System: string that I needC
Some other line here
Some other number of lines here |

What I am ultimately trying to do is output each instance of 'system' to the key:

Key System
1 string that I needA
1 string that I needB
1 string that I needC

I have tried to use regex but struggled to find the next line break, so instead tried doing a nested from, eg

SELECT
*,
SUBSTR(system2, INSTR(system2, 'System: ') 8) AS system2_desc,
FROM (
    SELECT
    *,
    SUBSTR(description, INSTR(description, 'System: ') 8) AS system1_desc,
    FROM (
        SELECT
        CASE WHEN INSTR(description, 'System: ') > 0 THEN 1 ELSE 0 END AS contains_system,
        description,
        FROM my_table
    )
)

and then later find and remove using CHR(10) but very quickly this would be unsustainable and without knowing the number of System strings, I'd have to account for way more than I might expect.

Is there a function that can extract the System strings either as expected output above or in array that I can then do a cross join on?

CodePudding user response:

Try this one:

with mytable as (
select 
    1 as key, 
    """Some data: 12345
Random string line
System: string that I needA
Some other line here
Some other number of lines here
Some data: qwerty
Random string line
System: string that I needB
Some other line here
Some other number of lines here
Some data: 67890
Random string line
System: string that I needC
Some other line here
Some other number of lines here |
""" as description
)
select key, system
from mytable, unnest(REGEXP_EXTRACT_ALL(description, r"System: (.*)\n")) as system 

enter image description here

CodePudding user response:

Try this

with mytable as
(
    select """
Some data: 12345
Random string line
System: string that I needA
Some other line here
Some other number of lines here
Some data: qwerty
Random string line
System: string that I needB
Some other line here
Some other number of lines here
Some data: 67890
Random string line
System: string that I needC
Some other line here
Some other number of lines here
""" as descp
),
cte as 
(
    select split(descp,'\n') as ls 
    from mytable
)
select replace(str, "System:", '') from cte,unnest(ls) as str
where starts_with(str,"System:" )
  • Related