Home > Software design >  Extract data from Complex JSON in Big-query
Extract data from Complex JSON in Big-query

Time:11-24

I have a data which looks like this -

{aa_validation: null 
 propensity_overlap: {auc pscore overlap: 0.5993614555297898 
                      auc pscore treated: 1.000000000000001 
                      auc pscore control: 1.0000000000000004
                      auc pscore ROC: 0.7524618788923345} 
 feature_balance: {% features with post matching SMD < 0.1: 100.0 
                   % features with post matching SMD < 0.25: 100.0 
                   % features with SMD improved after matching: 84.21052631578947 
                   % features with SMD not significantly worsened: 100.0}}

I want to use Big Query to make a column for each of these keys such as I get a result like this -

auc pscore overlap   auc pscore overlap...   % features with post matching SMD < 0.1   % features with post matching SMD < 0.25 ....

      0.32                    1                        50.0                      50.0

I have been going crazy using Regex_extract but cant seem to make it work. Can anyone help me extract this using Bigquery?

CodePudding user response:

This JSON schema is not suitable for BigQuery. You need to change the keys in order to be able to extract them properly.

This key, f.e, "% features with post matching SMD < 0.1" is not going to work with the JSON_EXTRACT function, as you can see here:

invalid key on sample query

Use different keys and then you will be able to launch queries like this one:

    SELECT JSON_EXTRACT(PARSE_JSON(json_field), "$.aa_validation") AS aa_validation,
JSON_EXTRACT(PARSE_JSON(json_field), "$.feature_balance") AS feature_balance,
JSON_EXTRACT(PARSE_JSON(json_field), "$.feature_balance.features_with_smd_improved_after_matching") AS smd_improved_after_matching,
FROM `qwiklabs-gcp-03-5570739e32d7.data.test2`

Taking advantage of PARSE_JSON and JSON_EXTRACT combined with JsonPath queries.

  • Related