I would like to add a new column based on the occurrence of an event for each individual (person_id) and which occurrence this is for that person by using the person_id column and date column which is in YYYY-MM-DD format. Some dates may be null.
For example, if it's the first time person '1234' is present in the table, based on the date, then return 'start', if it's the second, 'ongoing', if it's the third, 'complete' and if there's no date, e.g. null, then 'incomplete'. The columns below already exist, except the 'status' column, which would be the name of this newly created column.
I've tried various case when then scripts as well as partitions by can't quite seem to crack this for every outcome in one hit.
Desired result:
person_id | activity | combination_count | date | status |
---|---|---|---|---|
1234 | activity_1 | 1 | 2016-04-01 | start |
1234 | activity_1 | 2 | 2016-05-04 | complete |
1234 | activity_2 | 1 | null | incomplete |
5678 | activity_1 | 1 | 2019-09-01 | start |
CodePudding user response:
Consider below approach
select *,
case row_number() over(partition by person_id order by date nulls last) * if(date is null, 0, 1)
when 0 then 'incomplete'
when 1 then 'start'
when 2 then 'in progress'
when 3 then 'completed'
else 'game over'
end status
from data
if applied to sample data in your question - output is
It is not 100% clear from your question - but I think you want to count occurrences not just by person_id but also by activity - not sure - in this case just add activity
to partition by
as in partition by person_id, activity
CodePudding user response:
Might not be the most efficient way, but I think it's pretty easy to read:
WITH start AS (
SELECT person_id, MIN(date) AS date, "start" AS status, FROM data GROUP BY person_id
), complete AS (
SELECT person_id, MAX(date) AS date, "complete" AS status, FROM data GROUP BY person_id
), incomplete AS (
SELECT person_id, "incomplete" AS status, FROM data GROUP BY person_id
)
SELECT data.*, COALESCE(start.status, incomplete.status , complete.status, "ongoing") as status FROM data
LEFT JOIN start ON data.person_id = start.person_id AND data.date = start.date
LEFT JOIN complete ON data.person_id = complete.person_id AND data.date = complete.date
LEFT JOIN incomplete ON data.person_id = incomplete.person_id AND data.date IS NULL