Home > Back-end >  jq regex test() selecting unexpected values
jq regex test() selecting unexpected values

Time:11-28

I am trying to parse the JSON from aws ecr describe-images, and return a result with a specific image tag, matching a pattern, from the list. In particular I get an array with many of these entries:

{
  "imageDetails": [
    {
      "registryId": "997652729005",
      "repositoryName": "events",
      "imageDigest": "sha256:5b649219a3abc5e903b27fd947f375df8634c883432a69e40d245ac2393d67b2",
      "imageTags": [
        "events-test-build-340"
      ],
      "imageSizeInBytes": 314454408,
      "imagePushedAt": "2021-01-12T10:42:51-05:00",
      "imageScanStatus": {
        "status": "COMPLETE",
        "description": "The scan was completed successfully."
      },
      "imageScanFindingsSummary": {
        "imageScanCompletedAt": "2021-01-12T10:43:00-05:00",
        "vulnerabilitySourceUpdatedAt": "2021-01-12T04:45:25-05:00",
        "findingSeverityCounts": {}
      },
      "imageManifestMediaType": "application/vnd.docker.distribution.manifest.v2 json",
      "artifactMediaType": "application/vnd.docker.container.image.v1 json"
    },
    {
      "registryId": "997652729005",
      "repositoryName": "events",
      "imageDigest": "sha256:0fae259bcfe02c8cf0ec3746aae668b3166960e7119467496df9aedfbc2c8c5b",
      "imageTags": [
        "6debaabc26cc82a4011ea9c71854cebac7a57250-433",
        "6debaabc26cc82a4011ea9c71854cebac7a57250",
        "6debaabc26cc82a4011ea9c71854cebac7a57250-433-dev",
        "events-prod-build-433"
      ],
      "imageSizeInBytes": 316110570,
      "imagePushedAt": "2020-12-21T03:11:52-05:00",
      "imageScanStatus": {
        "status": "COMPLETE",
        "description": "The scan was completed successfully."
      },
      "imageScanFindingsSummary": {
        "imageScanCompletedAt": "2020-12-21T03:12:02-05:00",
        "vulnerabilitySourceUpdatedAt": "2020-11-03T20:21:09-05:00",
        "findingSeverityCounts": {}
      },
      "imageManifestMediaType": "application/vnd.docker.distribution.manifest.v2 json",
      "artifactMediaType": "application/vnd.docker.container.image.v1 json"
    }
  ]
}

I would like the output to be something like this:

{
  "tag": [
    "6debaabc26cc82a4011ea9c71854cebac7a57250"
  ],
  "sha": "sha256:5b649219a3abc5e903b27fd947f375df8634c883432a69e40d245ac2393d67b2",
  "imagePushedAt": "2021-01-12T10:42:51-05:00"
}

The challenge is to pick the images that have a tag whose name includes *prod-build* (a deployed production build), but then return a the tag having no dashes in it, which is the tag we actually use. (Yes, this is entirely defective, I know).

I have gotten pretty far:

cat ecr-describe-images-events.json 
  | jq '.imageDetails[] 
  | {tag: .imageTags, sha: .imageDigest, date_pushed: .imagePushedAt} 
  | select( .tag | contains(["prod-build"])) 
  .tag[] |= walk(
    if type=="string" then 
      select(
        match("^[^-] $")
      ) 
    else 
      null 
    end
  )'

So, from the imageDetails array, get and name elements, then from the array of tags select the nodes that have a tag with the string prod-build. From these nodes, find the
tags array element whose name does not include dashes, and return that.

The last part, which I have done with select, walk, and match is behaving differently than I expect. I am getting:

{
  "tag": [
    "events-test-build-340"
  ],
  "sha": "sha256:5b649219a3abc5e903b27fd947f375df8634c883432a69e40d245ac2393d67b2",
  "date_pushed": "2021-01-12T10:42:51-05:00"
}
{
  "tag": [
    "6debaabc26cc82a4011ea9c71854cebac7a57250",
    "events-prod-build-433",
    null,
    null
  ],
  "sha": "sha256:8638389b7d83869b17b1c74ff30740d7cf8eff4574100c1270f20d4686252552",
  "date_pushed": "2021-02-17T13:11:42-05:00"
}

If I don't include the last part, starting walk(...) I get the correct nodes. But when I do use walk with match or test, I get back array elements that don't match my regexp.

I am not fixed on my approach, or on the output format: I just need the three fields in some structure. What have I failed to understand?

CodePudding user response:

It seems you want:

jq '.imageDetails[] 
  | {tag: .imageTags, sha: .imageDigest, date_pushed: .imagePushedAt} 
  | select( any(.tag[]; test("prod-build")?))
  | .tag |= map(select(type=="string" and
        test("^[^-] $") ))'

CodePudding user response:

Here's a variant using contains for both checks, with the second one negated using not:

jq '.imageDetails[] | select(.imageTags | any(contains("prod-build"))) | {
  tag: .imageTags | map(select(contains("-") | not)),
  sha: .imageDigest,
  date_pushed: .imagePushedAt
}'
{
  "tag": [
    "6debaabc26cc82a4011ea9c71854cebac7a57250"
  ],
  "sha": "sha256:0fae259bcfe02c8cf0ec3746aae668b3166960e7119467496df9aedfbc2c8c5b",
  "date_pushed": "2020-12-21T03:11:52-05:00"
}

Demo

CodePudding user response:

You can use del to delete all tags that match a certain criterion:

.imageDetails[]
| {
    tag: .imageTags,
    sha: .imageDigest,
    date_pushed: .imagePushedAt
}
| select(.tag | contains(["prod-build"]))
| del(.tag[] | select(contains("-")))

First selects those objects with a "prod-build" tag and then deletes all other tags from the list.

Output:

{
  "tag": [
    "6debaabc26cc82a4011ea9c71854cebac7a57250"
  ],
  "sha": "sha256:0fae259bcfe02c8cf0ec3746aae668b3166960e7119467496df9aedfbc2c8c5b",
  "date_pushed": "2020-12-21T03:11:52-05:00"
}
  • Related