Home > Blockchain >  Use Excel Power Query to extract JSON data
Use Excel Power Query to extract JSON data

Time:10-30

I have a spreadsheet with 5 columns. One of those columns contains a json array (the array also has nested arrays). Is it possible to use Power Query Editor to parse the json array in each row? So that each row has the four original columns plus the new columns for each value in the Json array and sub arrays? While the data in each JSON array may be unique, the structure is always the same, though the number of elements/sub arrays fluctuates. Below is an example of the data I want to transform and how I would like to transform it.

Data to be transformed

response_id response_json
a476f978-430c-47fa-a2a4-7be0d863a69e [{"sublist":false,"aggregate_category":"null_options","question":"some question."},{"sublist":[{"option_1":"some val","option_2":"another value","option_3":"more values"}],"aggregate_category":"has_options","question":"another question"},{"sublist":[],"aggregate_category":"empty_options","question":"another question"}]

Output

ID aggregate_category question option_1 option_2 option_3
a476f978-430c-47fa-a2a4-7be0d863a69e null_options some question. null null null
a476f978-430c-47fa-a2a4-7be0d863a69e has_options some question. some val another value more values
a476f978-430c-47fa-a2a4-7be0d863a69e empty_options some question. null null null

enter image description here

Edit But, what happens is that, I can transform the initial json array and expand the rows. But, the embeded json array (titled "sublist") cannot be transformed any further, because of the values that are either null or false.

I think that one issue is that while all rows have a json string, the field in the json array, titled sublist can be

sublist:[]
sublist:[{"option_1":1, "option_2: "aaa", option_3:""}]
sublist:false

I think the issue is when the value is :false, it can't be parsed. So, maybe I have to put in the script, if the value is false, I have to put in fake data? I have tried this but it fails:

=if [response_json.sublist] = "false" then "no data" else Json.Document([response_json.sublist])

which I think could be because of the script is trying to parse something that is not there?

Edit - Add advance editor code

let
    Source = Excel.Workbook(File.Contents("D:\Downloads\ExampleDataSet.xlsx"), null, true),
    Sheet1_Sheet = Source{[Item="Sheet1",Kind="Sheet"]}[Data],
    #"Changed Type" = Table.TransformColumnTypes(Sheet1_Sheet,{{"Column1", type text}, {"Column2", type text}}),
    #"Promoted Headers" = Table.PromoteHeaders(#"Changed Type", [PromoteAllScalars=true]),
    #"Changed Type1" = Table.TransformColumnTypes(#"Promoted Headers",{{"response_id", type text}, {"response_json", type text}}),
    #"Parsed JSON" = Table.TransformColumns(#"Changed Type1",{{"response_json", Json.Document}}),
    #"Expanded response_json" = Table.ExpandListColumn(#"Parsed JSON", "response_json"),
    fixSublist = Table.TransformColumns(response_json,{"sublist", each if  _ = false then {} else _ }),
    #"Expanded response_json1" = Table.ExpandRecordColumn(fixSublist, "response_json", {"sublist", "aggregate_category", "question"}, {"response_json.sublist", "response_json.aggregate_category", "response_json.question"})
in
    #"Expanded response_json1"

Edit Other solution

In addition to the great help from Ron, I found that adding a custom column and adding this formula also worked:

=if [response_json.sublist] <> false then
Table.ToRecords(Table.FromRecords([response_json.sublist]))
else
Table.ToRecords(
    Table.FromRecords({
        [option_1 = "null", option_2 = "null", option_3 = "null"]    
    })
)

CodePudding user response:

  • Add a Custom Column:

enter image description here

Formula:  =Json.Document([JSON Field])
  • Expand the resulting List columns into new rows; expand the resultant records column (and the sublist)

  • Arrange and rename your columns as desired

Edit: Response to your M Code posting

  • You have bad references and you have the "fixSublist" line in the wrong place
  • Please step through the Applied Steps and read the comments in the code to better understand what is going on at each step.
  • Since I obtained the initial table from an open Workbook and not a file, the first few lines of my M code are different.

Data
enter image description here

M Code

let

//Next two lines different due to my source being a table in an open workbook and not a file
    Source = Excel.CurrentWorkbook(){[Name="Table3"]}[Content],
    #"Changed Type1" = Table.TransformColumnTypes(Source,{{"response_id", type text}, {"response_json", type text}}),
    #"Parsed JSON" = Table.TransformColumns(#"Changed Type1",{{"response_json", Json.Document}}),

//expand the subtables/records/lists
    #"Expanded response_json" = Table.ExpandListColumn(#"Parsed JSON", "response_json"),
    #"Expanded response_json2" = Table.ExpandRecordColumn(#"Expanded response_json", "response_json", 
        {"sublist", "aggregate_category", "question"}, {"sublist", "aggregate_category", "question"}),

//Now that the sublists are exposed we can fix the entries that are not Lists
    fixSublist = Table.TransformColumns(#"Expanded response_json2",{"sublist", each if  _ = false then {} else _ }),

//Now expand them
    #"Expanded sublist" = Table.ExpandListColumn(fixSublist, "sublist"),
    #"Expanded sublist1" = Table.ExpandRecordColumn(#"Expanded sublist", "sublist", {"option_1", "option_2", "option_3"}, {"option_1", "option_2", "option_3"})
in
    #"Expanded sublist1"

Results
enter image description here

  • Related