How to parse JSON with unknown number of keys and nested JSONs, with duplicate keys-CodePudding

I have a JSON that looks something like that one:

{
"condition_a":{
    "name": "aaa",
    "value": 10,
    "source": "bbb"
},
"condition_a":{
    "name": "ccc",
    "value": 20,
    "source": "ddd"
},
"condition_b":{
    "condition_c":{
        "name": "ddd",
        "value": 30,
        "source": "eee"
    },
    "condition_c":{
        "name": "fff",
        "value": 40,
        "source": "ggg"
    }
}

}

It has unknown number of keys, some of them hold nested JSONs within them, some of them don't. The key names are also changing and unknown. In addition, it has duplicate keys, like condition_a is duplicated throughout the JSON but holds different values within.

I want to know how to parse this JSON onto a table that looks like this:

con_1 and con_2 are the number of levels of nested JSONs it found to be - which is also unknown!

Thank you in advance!

CodePudding user response：

Consider below approach

select 
  ifnull(con_1, con_2) con_1,
  if(con_1 is null, null, con_2) con_2,
  name, value, source
from (
  select  kv,
    regexp_extract(norm_text, r'"([^"] )":{(?:"[^"] ":{[^{}] },)*' || kv || '.*}') con_1,
    trim(split(kv, ':')[offset(0)], '"') con_2, 
    regexp_extract(kv, r'"name": "([^"] )"') name,
    regexp_extract(kv, r'"value": (\d )') value,
    regexp_extract(kv, r'"source": "([^"] )"') source,
  from your_table,
  unnest([regexp_replace(regexp_replace(regexp_replace(text, r'\n', ''), r'({|,)(\s )(")', r'\1\3'), r'(")(\s )(})', r'\1\3')]) norm_text,
  unnest(regexp_extract_all(norm_text, r'"[^"] ":{\s*"name": "[^"] ",\s*"value": \d ,\s*"source": "[^"] "\s*}')) kv
)

if applied to sample data in your question - output is