Home > Back-end >  Use Bash and jq to filter array of objects based on values in an inner array, plus x objects before
Use Bash and jq to filter array of objects based on values in an inner array, plus x objects before

Time:01-20

Similar to this problem except I am trying to include objects before and after the matched objects.

So for example I want to find all objects with type.name='pass', plus any object that is within X (say 2) of this object, either before or after.

This JSON:

    [
     {
       "class": "Something1",
       "type": {
         "name: "Foul"
       }
     },
     {
       "class": "Something2",
       "type": {
         "name: "Carry"
       }
     },
     {
       "class": "Something3",
       "type": {
         "name: "Pass"
       }
     },
     {
       "class": "Something4",
       "type": {
         "name: "Pass"
       }
     },
     {
       "class": "Something5",
       "type": {
         "name: "Carry"
       }
     },
     {
       "class": "Something6",
       "type": {
         "name: "Carry"
       }
     },
     {
       "class": "Something7",
       "type": {
         "name: "Other"
       }
     },
     {
       "class": "Something8",
       "type": {
         "name: "Other"
       }
     },
     {
       "class": "Something9",
       "type": {
         "name: "Carry"
       }
     },
     {
       "class": "Something10",
       "type": {
         "name: "Pass"
       }
     },
     {
       "class": "Something1",
       "type": {
         "name: "Carry"
       }
     },
     {
       "class": "Something2",
       "type": {
         "name: "Carry"
       }
     },
     {
       "class": "Something3",
       "type": {
         "name: "Carry"
       }
     },
     {
       "class": "Something4",
       "type": {
         "name: "Other"
       }
     },
     {
       "class": "Something5",
       "type": {
         "name: "Carry"
       }
     }
    ]

Would output a new JSON string:

    [
      {
        "class": "Something1",
        "type": {
          "name: "Foul"
        }
      },
      {
        "class": "Something2",
        "type": {
          "name: "Carry"
        }
      },
      {
        "class": "Something3",
        "type": {
          "name: "Pass"
        }
      },
      {
        "class": "Something4",
        "type": {
          "name: "Pass"
        }
      },
      {
        "class": "Something5",
        "type": {
          "name: "Carry"
        }
      },
      {
        "class": "Something6",
        "type": {
          "name: "Carry"
        }
      },
      {
        "class": "Something8",
        "type": {
          "name: "Other"
        }
      },
      {
        "class": "Something9",
        "type": {
          "name: "Carry"
        }
      },
      {
        "class": "Something10",
        "type": {
          "name: "Pass"
        }
      },
      {
        "class": "Something1",
        "type": {
          "name: "Carry"
        }
      },
      {
        "class": "Something2",
        "type": {
          "name: "Carry"
        }
      }
    ]

Or it could output the index of the above objects in a list which can then be used to search the original JSON.

I can filter by "type.name" thanks to the answer quoted above, but I could not work out how to include surrounding objects.

    $ passes=$(cat file.json | jq -c '[ .[] | select( .type.name | contains("Pass")) ]')

The files I am working with are 140,000 lines long so efficiency is important.

Edit: Thanks @Gilles Quenot for fixing the code formatting.

Edit: Corrected errors in JSON and explained approach taken so far.

CodePudding user response:

Let's see …

  1. Fix the input to be valid and well-formed JSON.
  2. Get the index of each array element with to_entries.
  3. Use map(select(…)) pattern to extract all keys (i.e. the index) which match your predicate.
  4. "Pad" your extracted keys with adjacent keys: range(3) - 1.
  5. Store in variable.
  6. Use map(select(…)) again on the entries to extract any item where the index matches one of the previously-extracted indices

Putting it all together:

to_entries
| map(select(.value.type.name=="Pass").key   range(3) - 1) as $keys
| map(select(.key|IN($keys[])).value)

range(3)-1 produces the stream -1,0,1 which means that all indices within range 1 will be checked. To check all indices with a max distance of 2, use range(5)-2.

Output:

[
  {
    "class": "Something2",
    "type": {
      "name": "Carry"
    }
  },
  {
    "class": "Something3",
    "type": {
      "name": "Pass"
    }
  },
  {
    "class": "Something4",
    "type": {
      "name": "Pass"
    }
  },
  {
    "class": "Something5",
    "type": {
      "name": "Carry"
    }
  },
  {
    "class": "Something9",
    "type": {
      "name": "Carry"
    }
  },
  {
    "class": "Something10",
    "type": {
      "name": "Pass"
    }
  },
  {
    "class": "Something1",
    "type": {
      "name": "Carry"
    }
  }
]
  • Related