How to parse nested json to csv using command line-CodePudding

I want to parse a nested json to csv. The data looks similar to this.

{"tables":[{"name":"PrimaryResult","columns":[{"name":"name","type":"string"},{"name":"id","type":"string"},{"name":"custom","type":"dynamic"}]"rows":[["Alpha","1","{\"age\":\"23\",\"number\":\"xyz\"}]]]}

I want csv file as:

name  id  age  number
alpha  1  23    xyz

I tried:

jq -r ".tables | .[] | .columns | map(.name)|@csv" demo.json > demo.csv
jq -r ".tables | .[] | .rows |.[]|@csv" demo.json >> demo.csv

But I am not getting expected result.

Output:

name   id  custom
alpha  1   {"age":"23","number":"xyz}

Expected:
name  id  age  number
alpha  1  23    xyz

CodePudding user response：

Assuming valid JSON input:

{
  "tables": [
    {
      "name": "PrimaryResult",
      "columns": [
        { "name": "name",   "type": "string"  },
        { "name": "id",     "type": "string"  },
        { "name": "custom", "type": "dynamic" }
      ],
      "rows": [
        "Alpha",
        "1",
        "{\"age\":\"23\",\"number\":\"xyz\"}"
      ]
    }
  ]
}

And assuming fixed headers:

jq -r '["name", "id", "age", "number"],
(.tables[].rows | [.[0,1], (.[2] | fromjson | .age, .number)])
| @csv' input.json

Output:

"name","id","age","number"
"Alpha","1","23","xyz"

If any of the assumptions is wrong, you need to clarify your requirements, e.g.

How are column names determined?
What happens if the input contains multiple tables?
As the "dynamic" object always of the same shape? Or can it sometimes contain fewer, more, or different columns?

CodePudding user response：

Assuming that the .rows array is a 2D array of rows and fields, and that a column of type "dynamic" always expects a JSON-encoded object whose fields represent further columns but may or may not always be present in every row.

Then you could go with transposing the headers array and the rows array in order to integratively process each column by their type, especially collecting all keys from the "dynamic" type on the fly, and then transpose it back to get the row-based CSV output.

Input (I have added another row for illustration):

{
  "tables": [
    {
      "name": "PrimaryResult",
      "columns": [
        {
          "name": "name",
          "type": "string"
        },
        {
          "name": "id",
          "type": "string"
        },
        {
          "name": "custom",
          "type": "dynamic"
        }
      ],
      "rows": [
        [
          "Alpha",
          "1",
          "{\"age\":\"23\",\"number\":\"123\"}"
        ],
        [
          "Beta",
          "2",
          "{\"age\":\"45\",\"word\":\"xyz\"}"
        ]
      ]
    }
  ]
}

Filter:

jq -r '
  .tables[] | [.columns, .rows[]] | transpose | map(
    if first.type == "string" then first |= .name
    elif first.type == "dynamic" then
      .[1:] | map(fromjson)
      | (map(keys[]) | unique) as $keys
      | [$keys, (.[] | [.[$keys[]]])] | transpose[]
    else empty end
  )
  | transpose[] | @csv
'

Output:

"name","id","age","number","word"
"Alpha","1","23","123",
"Beta","2","45",,"xyz"

Demo