Home > Net >  Remove all objects containing a specific value from a JSON file using JQ?
Remove all objects containing a specific value from a JSON file using JQ?

Time:10-17

The bookmarks file for the Vivaldi browser (based on Chromium) tends to accumulate a huge number of base64-encoded thumbnails taking up a lot of space, and I would like to remove these entries. The file is a JSON file and an entry looks like this:

{
  "date_added": "13215828073144281",
  "guid": "3ace3174-ea60-42c5-88cf-e535a150ae38",
  "id": "74",
  "meta_info": {
     "Thumbnail": "data:image/jpeg;base64,/9j/4AAQSkZJRgA....AUpSgFKUoBSlKA//2Q=="
  },
  "name": "RIPE WHOIS IP Address Database Search › Look up an IP addres… - iTools",
  "type": "url",
  "url": "http://itools.com/tool/ripe-whois-ip-address"
},

I already have a jq filter looking like this:

jq 'walk(if type == "object" then with_entries(select(.key | test("Thumbnail") | not)) else . end)' Bookmarks > Bookmarks2

The problem is this also deletes entries containing custom thumbnails like this:

"Thumbnail": "chrome://vivaldi-data/local-image/aa0d8713-99c6-4fcb-a725-a29235c4e8b0",

So the question is, how would I remove only the Thumbnail entries containing or starting with the string data:image?

CodePudding user response:

Something like this should do the trick:

del(recurse | objects | select(has("Thumbnail")) .Thumbnail | select(startswith("data:image")))

CodePudding user response:

You could add another constraint startswith("data:image") | not and select to keep only the elements whose .key does not match or whose .value does not start that way, resulting in: select((.key | test("Thumbnail") | not) or (.value | startswith("data:image") | not)). You could even apply De Morgan's laws and simplify it to select(((.key | test("Thumbnail")) and (.value | startswith("data:image"))) | not).

However, there's a simpler approach: Assuming the overall structure is an array along the lines of

[
  {
    "date_added": "13215828073144281",
    "guid": "3ace3174-ea60-42c5-88cf-e535a150ae38",
    ...
  },
  {
    "date_added": "13215828073144282",
    "guid": "3ace3174-ea60-42c5-88cf-e535a150ae39",
    ...
  },
  ...
]

Then simply call

jq 'map(del(.meta_info.Thumbnail | select(startswith("data:image"))))' Bookmarks
  • Related