Home > database >  Bigquery sql: Replace keys of json to a new values present in a column
Bigquery sql: Replace keys of json to a new values present in a column

Time:04-07

I have two tables in Big Query as shown: table_1 has id, properties in json format, collection columns repectively.

table_2 has collection, property_key, property_name columns

I want to replace the json keys in properties column from table_1 with property_name from table_2

condition: where tabe_1.collection = table_2.collection and also property_key in table_2 contains the actual json_key's

Table_1:

id properties collection
workflow:623a776f04f1527a7bc2b3ed {"counterpartyName_string":{"type":"string","value":"XYZ"},"signerc_string":{"type":"string","value":"xyz_a"},"signer3_email":{"type":"email","value":"[email protected]"},"custome54_string":{"type":"string","value":"No (default answer)"},"paperSource_string":{"type":"string","value":"Our paper"},"acceptanceType_string":{"type":"string","value":"eSignature"},"agreementDate":{"type":"date","value":"2018-04-02"},"workflowId":{"type":"string","value":"123"},"workflowCreatedDate":{"type":"date","value":"2018-03-23"},"workflowUpdatedDate":{"type":"date","value":"2022-04-02T15:16:24.688Z"},"workflowCreator":{"type":"string","value":"6008"},"workflowOwner":{"type":"string","value":"6008"},"workflowConfigurationVersionNumber":{"type":"number","value":12.0}} 57b4c959e899aa130069e4e0
52523e7b-0a19-4551-be43-5c1d124beea0 {"amendment":{"type":"boolean","value":false},"10vs25term":{"type":"string","value":"10 Term"},"counterpartyAddress":{"type":"string","value":"USA"},"agreementDate":{"type":"date","value":"2018-11-05"},"contractType":{"type":"string","value":"ABC"},"contactForNotice":{"type":"string","value":"UMC"},"statementOfWork":{"type":"boolean","value":false},"mastercardEntity":{"type":"string","value":"Master"},"folder":{"type":"string","value":"NDA"},"importBatch":{"type":"string","value":"Sale"},"counterpartyName":{"type":"string","value":"Gra"},"documentID":{"type":"string","value":"115"},"region":{"type":"string","value":"United States"},"effectiveDate":{"type":"date","value":"2018-11-05"}} 5db09e5d9c32aaa121bf4ca1

Table_2:

collection property_key property_name
57b4c959e899aa130069e4e0 counterpartyName_string Counterparty Name
57b4c959e899aa130069e4e0 signerc_string Counterparty Signer Name
57b4c959e899aa130069e4e0 signer3_email Counterparty Signer Email
57b4c959e899aa130069e4e0 custome54_string Equipment
5db09e5d9c32aaa121bf4ca1 amendment Amendment
5db09e5d9c32aaa121bf4ca1 10vs25term 10 vs 25 Term
5db09e5d9c32aaa121bf4ca1 counterpartyAddress Counterparty Address
5db09e5d9c32aaa121bf4ca1 agreementDate Agreement Date

I wanted the result table be like a new column with changed properties field from table_1 as shown

Result_table:

id properties collection renamed_properties
workflow:623a776f04f1527a7bc2b3ed {"counterpartyName_string":{"type":"string","value":"XYZ"},"signerc_string":{"type":"string","value":"xyz_a"},"signer3_email":{"type":"email","value":"[email protected]"},"custome54_string":{"type":"string","value":"No (default answer)"},"paperSource_string":{"type":"string","value":"Our paper"},"acceptanceType_string":{"type":"string","value":"eSignature"},"agreementDate":{"type":"date","value":"2018-04-02"},"workflowId":{"type":"string","value":"123"},"workflowCreatedDate":{"type":"date","value":"2018-03-23"},"workflowUpdatedDate":{"type":"date","value":"2022-04-02T15:16:24.688Z"},"workflowCreator":{"type":"string","value":"6008"},"workflowOwner":{"type":"string","value":"6008"},"workflowConfigurationVersionNumber":{"type":"number","value":12.0}} 57b4c959e899aa130069e4e0 {"Counterparty Name":{"type":"string","value":"XYZ"},"Counterparty Signer Name":{"type":"string","value":"xyz_a"},"Counterparty Signer Email":{"type":"email","value":"[email protected]"},"Equipment":{"type":"string","value":"No (default answer)"},"Paper Source":{"type":"string","value":"Our paper"},"Acceptance Type":{"type":"string","value":"eSignature"},"Agreement Date":{"type":"date","value":"2018-04-02"},"Workflow Id":{"type":"string","value":"123"},"Workflow Created Date":{"type":"date","value":"2018-03-23"},"Workflow Updated Date":{"type":"date","value":"2022-04-02T15:16:24.688Z"},"Workflow Creator":{"type":"string","value":"6008"},"Workflow Owner":{"type":"string","value":"6008"},"Workflow Configuration Version Number":{"type":"number","value":12.0}}
52523e7b-0a19-4551-be43-5c1d124beea0 {"amendment":{"type":"boolean","value":false},"10vs25term":{"type":"string","value":"10 Term"},"counterpartyAddress":{"type":"string","value":"USA"},"agreementDate":{"type":"date","value":"2018-11-05"},"contractType":{"type":"string","value":"ABC"},"contactForNotice":{"type":"string","value":"UMC"},"statementOfWork":{"type":"boolean","value":false},"mastercardEntity":{"type":"string","value":"Master"},"folder":{"type":"string","value":"NDA"},"importBatch":{"type":"string","value":"Sale"},"counterpartyName":{"type":"string","value":"Gra"},"documentID":{"type":"string","value":"115"},"region":{"type":"string","value":"United States"},"effectiveDate":{"type":"date","value":"2018-11-05"}} 5db09e5d9c32aaa121bf4ca1 {"Amendment":{"type":"boolean","value":false},"10 vs 25 Term":{"type":"string","value":"10 Term"},"Counterparty Address":{"type":"string","value":"USA"},"Agreement Date":{"type":"date","value":"2018-11-05"},"Contract Type":{"type":"string","value":"ABC"},"Contact For Notice":{"type":"string","value":"UMC"},"Statement Of Work":{"type":"boolean","value":false},"Mastercard Entity":{"type":"string","value":"Master"},"Folder":{"type":"string","value":"NDA"},"Import Batch":{"type":"string","value":"Sale"},"Counterparty Name":{"type":"string","value":"Gra"},"Document ID":{"type":"string","value":"115"},"Region":{"type":"string","value":"United States"},"Effective Date":{"type":"date","value":"2018-11-05"}}

Can someone guide me the correct way to solve this in BigQuery using sql or sqludf's

CodePudding user response:

Consider below approach

select id, collection, 
  '{' || string_agg(renamed_property, ',' order by offset) || '}' renamed_properties
from (
  select id, collection, property, offset, 
    array_agg(replace(property, property_key, property_name) order by regexp_contains(property, property_key) desc limit 1)[offset(0)] renamed_property
  from table_1 t1,
  unnest(regexp_extract_all(properties, r'"[^"] ":{"type":"[^"] ","value":[^}] }')) as property with offset 
  left join table_2 t2
  using (collection) 
  group by id, collection, property, offset
)
group by id, collection    

if applied to sample data in y our question - output is

enter image description here

  • Related