Essentially date field is updated every month along with other fields, however one field is only updated ~6 times throughout the year. For months where that field is not updated, looking to show the most recent previous data
Date | Emp_no | Sales | Group |
---|---|---|---|
Jan | 1234 | 100 | Med |
Feb | 1234 | 200 | --- |
Mar | 1234 | 170 | --- |
Apr | 1234 | 150 | Low |
May | 1234 | 180 | --- |
Jun | 1234 | 90 | High |
Jul | 1234 | 100 | --- |
Need it to show:
Date | Emp_no | Sales | Group |
---|---|---|---|
Jan | 1234 | 100 | Med |
Feb | 1234 | 200 | Med |
Mar | 1234 | 170 | Med |
Apr | 1234 | 150 | Low |
May | 1234 | 180 | Low |
Jun | 1234 | 90 | High |
Jul | 1234 | 100 | High |
This field is not updated at set intervals, could be 1-4 months of Nulls in a row
Tried something like this to get the second most recent date but unsure how to deal with the fact that i could need between 1-4 months prior
LAG(Group)
OVER(PARTITION BY emp_no
ORDER BY date)
Thanks!
CodePudding user response:
This is the traditional "gaps and islands" problem.
There are various ways to solve it, a simple version will work for you.
First, create a new identifier that splits the rows in to "groups", where only the first row in each group is NOT NULL.
SUM(CASE WHEN "group" IS NOT NULL THEN 1 ELSE 0 END) OVER (PARTION BY emp_no ORDER BY "date") AS emp_group_id
Then you can use MAX() in another window function, as all "groups" will only have one NOT NULL value.
WITH
gaps
AS
(
SELECT
t.*,
SUM(
CASE WHEN "group" IS NOT NULL
THEN 1
ELSE 0
END
)
OVER (
PARTITION BY emp_no
ORDER BY "date"
)
AS emp_group_id
FROM
your_table t
)
SELECT
"date",
emp_no,
sales,
MAX("group")
OVER (
PARTITION BY emp_no, emp_group_id
)
AS "group"
FROM
gaps
Edit
Ignore all that.
Oracle has IGNORE NULLS.
LAST_VALUE("group" IGNORE NULLS)
OVER (
PARTITION BY emp_no
ORDER BY "date"
ROWS BETWEEN UNBOUNDED PRECEDING
AND CURRENT ROW
)
AS "group"