In the query I built, the result shows something like below:
SELECT name
,ARRAY_AGG(fruits ORDER BY time ASC) AS all_fruits
FROM table_fruits
name | all_fruits |
---|---|
Person A | Apple, Banana, Apple, Apple, Apple, Apple |
Person B | Apple, Apple, Apple, Banana, Apple, Banana |
Person C | Banana, Banana, Apple, Banana, Apple, Apple |
I want to add one more column which shows the count of apples. However, I do not want to count apples that are followed by bananas. Therefore, the additional column should look like below.
name | all_fruits | count_of_apple |
---|---|---|
Person A | Apple, Banana, Apple, Apple, Apple, Apple | 4 |
Person B | Apple, Apple, Apple, Banana, Apple, Banana | 2 |
Person C | Banana, Banana, Apple, Banana, Apple, Apple | 2 |
How would I do this in SQL? The source includes time for when the fruit was eaten.
CodePudding user response:
You can check:
- for each row you have in your parent table, what "fruits" value follows that row with the
LEAD
window function - if the row in check is the last one, it won't have a next value (it will be NULL), so the
COALESCE
function will replace this NULL value with the current "fruits" value - hence you can assign 1 to your new column when the current "fruits" value is
"Apple"
and your next value is not"Banana"
, inside aCASE
statement
SELECT *,
CASE WHEN fruits = 'Apple'
AND COALESCE(LEAD(fruits) OVER(
PARTITION BY name
ORDER BY time),
fruits) <> 'Banana'
THEN 1
END AS apples_not_after_bananas
FROM table_fruits
After this step, you can use your own code and add
- the
GROUP BY
clause you missed, to aggregate over the "name" field - the
SUM
aggregation function over the previously generated1
s when apples were not followed by bananas.
WITH cte AS (
SELECT *,
CASE WHEN fruits = 'Apple'
AND COALESCE(LEAD(fruits) OVER(
PARTITION BY name
ORDER BY time),
fruits) <> 'Banana'
THEN 1
END AS apples_not_after_bananas
FROM table_fruits
)
SELECT name,
ARRAY_AGG(fruits ORDER BY time ASC) AS all_fruits,
SUM(apples_not_after_bananas) AS count_of_apple
FROM cte
GROUP BY name
Check the demo here.
Edit: the banana came more than 1 day later
If you want to add this specific condition, or in general any conditions, you need to work inside the CASE statement, which currently has two conditions, one on the current fruit and one on the next fruit.
Checking whether the banana came more than 1 day later just means to add something like this:
CASE WHEN fruits = 'Apple'
AND COALESCE(LEAD(fruits) OVER(
PARTITION BY name
ORDER BY time),
fruits) <> 'Banana'
--AND <if difference between the current next time value is greater than 1 day>
THEN 1
END AS apples_not_after_bananas