Home > front end >  Repeated record to one string, BigQuery
Repeated record to one string, BigQuery

Time:02-09

I have a table that contains query jobs, with a repeated record colomn referenced_tables. There are 3 colomns and several rows inside this record, and I would like to transform the whole record into a single string, with a seperator for the colomns '.' and ',' for the rows. The colomns inside the record are project, dataset and table id's (string), and each row is a different table. I am trying something like this :

select array_to_string(
    [
        array_to_string(
            [
                referenced_tables[ordinal(1)].project_id, 
                referenced_tables[ordinal(1)].dataset_id, 
                referenced_tables[ordinal(1)].table_id
            ], "."
        ),
        array_to_string(
            [
                referenced_tables[ordinal(2)].project_id, 
                referenced_tables[ordinal(2)].dataset_id, 
                referenced_tables[ordinal(2)].table_id
            ], "."
        )
    ], ", "
)
FROM my_table

The result of this is like : project1.dataset1.table1,project2.dataset2.table2 when i specify a job. So it works but I have to repeat the array_to_string as much as there are rows, and of course the number of rows changes from a job to another, and referenced_tables can be NULL, so if i do this on the whole table, an error will occur.

Is there a way to transform the records into strings, for the whole table, with my conditions ?

CodePudding user response:

I made some assumptions about how your data is structured. Given you said this is by query there is some kind of query_id associated to group by.

Try the following

select
  string_agg(concat(t.project_id,".",t.dataset_id,".",t.table_id),", ") 
from sample_data
, unnest(referenced_tables) t

With the following sample data:

with sample_data as (
    select 1 as query_id
        , [STRUCT('my_project' as project_id, 'my_dataset' as dataset_id, 'table' as table_id)
           ,STRUCT('my_project1' as project_id, 'my_datase1' as dataset_id, 'table1' as table_id)
           ,STRUCT('my_projec2' as project_id, 'my_dataset2' as dataset_id, 'table2' as table_id)
           ] as referenced_tables
    UNION ALL 
    select 2
        , [STRUCT('my_project3' as project_id, 'my_dataset3' as dataset_id, 'table3' as table_id)
           ,STRUCT('my_project4' as project_id, 'my_datase4' as dataset_id, 'table4' as table_id)
           ,STRUCT('my_projec5' as project_id, 'my_dataset5' as dataset_id, 'table5' as table_id)
           ] as referenced_tables
)

It produces enter image description here

  •  Tags:  
  • Related