Home > Software engineering >  How can I process JSON records with optional fields using jq?
How can I process JSON records with optional fields using jq?

Time:06-17

I am trying to process a stream of JSON records that looks like this:

[
{
        "id":"1234",
        "comment":"whatevz",
        "tags":[
                {"Key":"key1", "Value":"someval-1234-k1"},
                {"Key":"key2", "Value":"someval-1234-k2"},
                {"Key":"key3", "Value":"someval-1234-k3"}
        ]
},
{
        "id":"4567",
        "comment":"pff",
        "tags":[
                {"Key":"key1", "Value":"someval-4567-k1"},
                {"Key":"key3", "Value":"someval-4567-k3"}
        ]

}
]

To this:

{
  "id": "1234",
  "key1": "someval-1234-k1",
  "key2": "someval-1234-k2"
}
{
  "id": "4567",
  "key1": "someval-4567-k1",
  "key2": null
}

That is, I want to preserve only a subset of the tags, not all of them, and pull them up into the main object, not leave them as nested.

The problem I'm facing is that all the tags are optional, so some records will be missing some tags (e.g., key2 in this example).

So my naive way of processing them:

jq ' 
        .[] | {
                id,
                key1:.tags[] | select(.Key | match("key1")) | .Value,
                key2:.tags[] | select(.Key | match("key2")) | .Value
        }
'

Will drop any records that are missing that tag:

{
  "id": "1234",
  "key1": "someval-1234-k1",
  "key2": "someval-1234-k2"
}

I tried using the // logical or to work around this:

jq ' 
        .[] | {
                id,
                key1:.tags[] | select(.Key | match("key1")) | .Value,
                key2:.tags[] | ( select(.Key | match("key2")) // null ) | .Value        
        }
'

But this, to my surprise, seemed to duplicate the records:

{
  "id": "1234",
  "key1": "someval-1234-k1",
  "key2": null
}
{
  "id": "1234",
  "key1": "someval-1234-k1",
  "key2": "someval-1234-k2"
}
{
  "id": "1234",
  "key1": "someval-1234-k1",
  "key2": null
}
{
  "id": "4567",
  "key1": "someval-4567-k1",
  "key2": null
}
{
  "id": "4567",
  "key1": "someval-4567-k1",
  "key2": null
}

I also tried other forms of conditional (if, try/catch), but those didn't work at all.

So I guess I have a few questions:

  1. Why is the // actually duplicating the records?

  2. How can I achieve what I want?

  3. Is there a more idiomatic way to do this transformation? The way I'm doing it seems like it could be made to work, but doesn't seem ideal. E.g., if I wanted to pull up hundreds of tags per record, I feel like I would want to use some kind of looping construct, rather than mapping them one-by-one like I'm doing. But maybe not...? I don't actually know.

Thanks.

CodePudding user response:

// is not causing the duplication - it's the double use of .tags[] - in particular the [] which "unpacks" each time.

You could instead filter .tags with map(select(...))

jq ' 
        .[] | {
                id,
                key1:.tags | map(select(.Key | match("key1")) | .Value)[0],
                key2:.tags | map(select(.Key | match("key2")) | .Value)[0]
        }
'

Producing your desired output:

{
  "id": "1234",
  "key1": "someval-1234-k1",
  "key2": "someval-1234-k2"
}
{
  "id": "4567",
  "key1": "someval-4567-k1",
  "key2": null
}

As for a more idiomatic approach you could use from_entries which turns { "key": "mykey", "value": "myvalue" } into { "mykey": "myvalue" }

Here we transform .tags and add the result to the main object so we can extract all the keys together:

jq '.[] | .   (.tags | from_entries) | { id, key1, key2 }' 

{
  "id": "1234",
  "key1": "someval-1234-k1",
  "key2": "someval-1234-k2"
}
{
  "id": "4567",
  "key1": "someval-4567-k1",
  "key2": null
}
  • Related