Home > Software design >  How to transform nested JSON to csv using jq
How to transform nested JSON to csv using jq

Time:11-02

I have tried to transform json in the following format to csv using jq on Linux cmd line, but with no success. Any help of guidance would be appreciated.

{
    "dir/file1.txt": [
      {
        "Setting": {
          "SettingA": "",
          "SettingB": null
        },
        "Rule": "Rulechecker.Rule15",
        "Description": "",
        "Line": 11,
        "Link": "www.sample.com",
        "Message": "Some message",
        "Severity": "error",
        "Span": [
          1,
          3
        ],
        "Match": "[id"
      },
      {
        "Setting": {
          "SettingA": "",
          "SettingB": null
        },
        "Check": "Rulechecker.Rule16",
        "Description": "",
        "Line": 27,
        "Link": "www.sample.com",
        "Message": "Fix the rule",
        "Severity": "error",
        "Span": [
          1,
          3
        ],
        "Match": "[id"
      }
    ],
    "dir/file2.txt": [
      {
        "Setting": {
          "SettingA": "",
          "SettingB": null
        },
        "Rule": "Rulechecker.Rule17",
        "Description": "",
        "Line": 51,
        "Link": "www.example.com",
        "Message": "Fix anoher 'rule'?",
        "Severity": "error",
        "Span": [
          1,
          18
        ],
        "Match": "[source,terminal]\n----\n"
      }
    ]
}

Ultimately, I want to present a matrix with dir/file1.txt, dir/file2.txt as rows on the left of the matrix, and all the keys to be presented as column headings, with the corresponding values.

| Filename  | SettingA | SettingB | Rule | More columns... |
| -------- | -------------- | -------------- | -------------- | -------------- |
| dir/file1.txt    |             | null | Rulechecker.Rule15 | |
| dir/file1.txt   |             | null | Rulechecker.Rule16 | |
| dir/file2.txt   |             | null | Rulechecker.Rule17  | |                       

CodePudding user response:

Iterate over the top-level key-value pairs obtained by to_entries to get access to the key names, then once again over its content array in .value to get the array items. Also note that newlines as present in the sample's last .Match value cannot be used as is in a line-oriented format such as CSV. Here, I chose to replace them with the literal string \n using gsub.

jq -r '
  to_entries[] | . as {$key} | .value[] | [$key,
    (.Setting | .SettingA, .SettingB),
    .Rule // .Check, .Description, .Line, .Link,
    .Message, .Severity, .Span[], .Match
    | strings |= gsub("\n"; "\\n")
  ] | @csv
'
"dir/file1.txt","",,"Rulechecker.Rule15","",11,"www.sample.com","Some message","error",1,3,"[id"
"dir/file1.txt","",,"Rulechecker.Rule16","",27,"www.sample.com","Fix the rule","error",1,3,"[id"
"dir/file2.txt","",,"Rulechecker.Rule17","",51,"www.example.com","Fix anoher 'rule'?","error",1,18,"[source,terminal]\n----\n"

Demo


If you just want to dump all the values in the order they appear, you can simplify this by using .. | scalars to traverse the levels of the document:

jq -r '
  to_entries[] | . as {$key} | .value[] | [$key,
    (.. | scalars) | strings |= gsub("\n"; "\\n")
  ] | @csv
'
"dir/file1.txt","",,"Rulechecker.Rule15","",11,"www.sample.com","Some message","error",1,3,"[id"
"dir/file1.txt","",,"Rulechecker.Rule16","",27,"www.sample.com","Fix the rule","error",1,3,"[id"
"dir/file2.txt","",,"Rulechecker.Rule17","",51,"www.example.com","Fix anoher 'rule'?","error",1,18,"[source,terminal]\n----\n"

Demo


As for the column headings, for the first case I'd add them manually, as you spell out each value path anyways. For the latter case it will be a little complicated as not all coulmns have immediate names (what should the items of array Span be called?), and some seem to change (in the second record, column Rule is called Check). You could, however, stick to the names of the first record, and taking the deepest field name either as is or add the array indices. Something along these lines would do:

jq -r '
  to_entries[0].value[0] | ["Filename", (
    path(..|scalars) | .[.[[map(strings)|last]]|last:] | join(".")
  )] | @csv
'
"Filename","SettingA","SettingB","Rule","Description","Line","Link","Message","Severity","Span.0","Span.1","Match"

Demo

  • Related