I have a column with data type string and it has nested key value pairs. I want to extract each key:value pair as a columnname:values.
The data looks like -
ID attributes
1 {"category":"ABC","region":US,"item_id":[1,2,3]}
2 {"category":"PQR","region":EU,"item_id":[3,4,5]}
I want to separate these key-value pairs into different columns. The output should be as follows-
ID category region item_id
1 ABC US [1,2,3]
2 PQR EU [3,4,5]
How to achieve this output? I tried using UNNEST but it didn't work.
CodePudding user response:
Consider below approach
select * from (
select id, arr[offset(0)] key, arr[offset(1)] value
from your_table, unnest(split(translate(attributes, '{}', ''), ',"')) kv,
unnest([struct(split(replace(kv, '"', ''), ':') as arr)])
)
pivot (min(value) for key in ('category', 'region', 'item_id'))
If applied to sample data in your question - output is
CodePudding user response:
you can simply create a dataframe from the list of dicts in your desired out format, with python list comprehension inside of a dataframe create statement:
import pandas as pd
outpd = pd.DataFrame([x for x in mydata['attributes']])
print(outpd)
category region item_id
0 ABC US [1, 2, 3]
1 PQR EU [3, 4, 5]