Home > Net >  How do I write a jq query to convert a JSON file to CSV?
How do I write a jq query to convert a JSON file to CSV?

Time:12-20

The JSON files look like:

{
  "name": "My Collection",
  "description": "This is a great collection.",
  "date": 1639717379161,
  "attributes": [
    {
      "trait_type": "Background",
      "value": "Sand"
    },
    {
      "trait_type": "Skin",
      "value": "Dark Brown"
    },
    {
      "trait_type": "Mouth",
      "value": "Smile Basic"
    },
    {
      "trait_type": "Eyes",
      "value": "Confused"
    }
  ]  
}

I found a shell script that uses jq and has this code:

i=1
for eachFile in *.json; do
    cat $i.json | jq -r '.[] | {column1: .name, column2: .description} | [.[] | tostring] | @csv' > extract-$i.csv
    echo "converted $i of many json files..."
    ((i=i 1))
done

But its output is:

jq: error (at <stdin>:34): Cannot index string with string "name"
converted 1 of many json files...

Any suggestions on how I can make this work? Thank you!

CodePudding user response:

Quick jq lesson
===========

jq filters are applied like this:
jq -r '.name_of_json_field_0 <optional filter>, .name_of_json_field_1 <optional filter>'
and so on and so forth. A single dot is the simplest filter; it leaves the data field untouched.

jq -r '.name_of_field .'

You may also leave the filter field untouched for the same effect. In your case: jq -r '.name, .description' will extract the values of both those fields.

.[] will unwrap an array to have the next piped filter applied to each unwrapped value. Example:
jq -r '.attributes | .[]
extracts all trait_types objects.

You may sometime want to repackage objects in an array by surrounding the filter in brackets:
jq -r '[.name, .description, .date]

You may sometime want to repackage data in an object by surrounding the filter in curly braces:
`jq -r '{new_field_name: .name, super_new_field_name: .description}'

playing around with these, I was able to get

jq -r '[.name, .description, .date, (.attributes | [.[] | .trait_type] | @csv | gsub(",";";") | gsub("\"";"")), (.attributes | [.[] | .value] | .[]] | @csv | gsub(",";";") | gsub("\"";""))] | @csv'

to give us:
"My Collection","This is a great collection.",1639717379161,"Background;Skin;Mouth;Eyes","Sand;Dark Brown;Smile Basic;Confused"

Name, description, and date were left as is, so let's break down the weird parts, one step at a time.

.attributes | [.[] | .trait_type]
.[] extracts each element of the attributes array and pipes the result of that into the next filter, which says to simply extract trait_type, where they are re-packaged in an array.

.attributes | [.[] | .trait_type] | @csv
turn the array into a csv-parsable format.

(.attributes | [.[] | .trait_type] | @csv | gsub(",";";") | gsub("\"";""))
Parens separate this from the rest of the evaluations, obviously. The first gsub here replaces commas with semicolons so they don't get interpreted as a separate field, the second removes all extra double quotes.

  • Related