Home > OS >  Athena - How to query by nested json value?
Athena - How to query by nested json value?

Time:11-13

In Athena, we're trying to get all fruits that contain seeds. The issue we're facing is that seeds true/false is nested in json and we can't seem to filter by it in the WHERE clause. We can only see results when we SELECT it as hasSeeds from the nested json path, but that returns all seed results true and false instead of just true

-- This returns all seeds, but we want to filter in the WHERE clause by hasSeeds true
SELECT foodId, foodType, json_extract(payload,'$.food.info.seeds') AS hasSeeds
FROM "food_table"
WHERE foodType = 'fruit'

-- 3 attempts to filter by nested json value seeds true (not working)
SELECT * FROM
(SELECT foodId, foodType, json_extract(payload,'$.food.info.seeds') AS hasSeeds
FROM "food_table"
WHERE foodType = 'fruit')
WHERE hasSeeds = true

SELECT foodId, foodType, json_extract(payload,'$.food.info.seeds') AS hasSeeds
FROM "food_table"
WHERE foodType = 'fruit' and hasSeeds = true

SELECT foodId, foodType, json_extract(payload,'$.food.info.seeds') AS hasSeeds
FROM "food_table"
WHERE foodType = 'fruit' and json_extract(payload,'$.food.info.seeds') = true

Any idea how we can get the query to work as expected where we filter by hasSeeds true on the nested json?

Update with JSON structure:

{
  "foodId": 1,
  "foodType": "fruit",
  "payload": {
    "food": {
      "name": "apple",
      "info": {
        "seeds": true,
        "calories": 95
      }
    }
  }
}

{
  "foodId": 2,
  "foodType": "fruit",
  "payload": {
    "food": {
      "name": "banana"
    }
  }
}

{
  "foodId": 3,
  "foodType": "vegetable",
  "payload": {
  }
}

{
  "foodId": 4,
  "foodType": "fruit",
  "payload": {
  }
}

Tried this query but it returns incorrect results:

WITH dataset(jsn) AS (
    values (JSON '{"payload":{"food":{"info":{"seeds":true}}}}')
)

SELECT foodId, foodType, json_extract(payload,'$.food.info.seeds') AS hasSeeds, jsn
FROM "food_table", dataset
WHERE cast(json_extract_scalar(jsn, '$.payload.food.info.seeds') AS BOOLEAN) = true

Basically, the hasSeeds comes back as true which is correct, but other food items, like those missing a payload's children still comeback with the jsn field, instead of actually just querying only on the WHERE clause by something like payload.food.info.seeds = true

Any idea how to fix it?

CodePudding user response:

Assuming you have next json structure, you need to use json_extract_scalar:

WITH dataset(jsn) AS (
    values (JSON '{"food":{"info":{"seeds":true}}}')
)

SELECT *
FROM dataset
WHERE cast(json_extract_scalar(jsn, '$.food.info.seeds') AS BOOLEAN) = true

UPD

Based on provided json WHERE clause should look something like:

WHERE try(cast(json_extract_scalar(payload, '$.payload.food.info.seeds') AS BOOLEAN)) = true
  • Related