Home > database >  How to flatten a nested string?
How to flatten a nested string?

Time:10-10

I have a column with data type string and it has nested key value pairs. I want to extract each key:value pair as a columnname:values.

The data looks like -

ID    attributes
1     {"category":"ABC","region":US,"item_id":[1,2,3]}
2     {"category":"PQR","region":EU,"item_id":[3,4,5]}

I want to separate these key-value pairs into different columns. The output should be as follows-

ID    category      region      item_id
1       ABC           US        [1,2,3]
2       PQR           EU        [3,4,5]

How to achieve this output? I tried using UNNEST but it didn't work.

CodePudding user response:

Consider below approach

select * from (
  select id, arr[offset(0)] key, arr[offset(1)] value 
  from your_table, unnest(split(translate(attributes, '{}', ''), ',"')) kv,
  unnest([struct(split(replace(kv, '"', ''), ':') as arr)])
)
pivot (min(value) for key in ('category', 'region', 'item_id'))         

If applied to sample data in your question - output is

enter image description here

CodePudding user response:

you can simply create a dataframe from the list of dicts in your desired out format, with python list comprehension inside of a dataframe create statement:

import pandas as pd
outpd = pd.DataFrame([x for x in mydata['attributes']])
print(outpd)  
  category region    item_id
0      ABC     US  [1, 2, 3]
1      PQR     EU  [3, 4, 5]
  • Related