I have a data which looks like this -
{aa_validation: null
propensity_overlap: {auc pscore overlap: 0.5993614555297898
auc pscore treated: 1.000000000000001
auc pscore control: 1.0000000000000004
auc pscore ROC: 0.7524618788923345}
feature_balance: {% features with post matching SMD < 0.1: 100.0
% features with post matching SMD < 0.25: 100.0
% features with SMD improved after matching: 84.21052631578947
% features with SMD not significantly worsened: 100.0}}
I want to use Big Query to make a column for each of these keys such as I get a result like this -
auc pscore overlap auc pscore overlap... % features with post matching SMD < 0.1 % features with post matching SMD < 0.25 ....
0.32 1 50.0 50.0
I have been going crazy using Regex_extract but cant seem to make it work. Can anyone help me extract this using Bigquery?
CodePudding user response:
This JSON schema is not suitable for BigQuery. You need to change the keys in order to be able to extract them properly.
This key, f.e, "% features with post matching SMD < 0.1" is not going to work with the JSON_EXTRACT function, as you can see here:
Use different keys and then you will be able to launch queries like this one:
SELECT JSON_EXTRACT(PARSE_JSON(json_field), "$.aa_validation") AS aa_validation,
JSON_EXTRACT(PARSE_JSON(json_field), "$.feature_balance") AS feature_balance,
JSON_EXTRACT(PARSE_JSON(json_field), "$.feature_balance.features_with_smd_improved_after_matching") AS smd_improved_after_matching,
FROM `qwiklabs-gcp-03-5570739e32d7.data.test2`
Taking advantage of PARSE_JSON and JSON_EXTRACT combined with JsonPath queries.