I have a question. I got a Table with the following
PersonalNumber | Dim_Date | PercentageCategory |
---|---|---|
1 | 20190101 | 0.81 |
1 | 20190101 | 0.81 |
1 | 20190101 | 0.81 |
1 | 20190101 | 0.81 |
1 | 20190102 | NULL |
1 | 20190103 | NULL |
1 | 20190201 | 0.81 |
1 | 20190301 | 0.86 |
1 | 20190401 | 0.88 |
1 | 20190501 | 0.87 |
1 | 20190601 | 0.88 |
1 | 20190701 | 0.66 |
1 | 20190702 | NULL |
1 | 20190801 | 0.82 |
1 | 20190901 | 0.83 |
1 | 20191001 | 0.84 |
2 | 20190101 | 0.81 |
2 | 20190201 | 0.81 |
2 | 20190301 | 0.86 |
2 | 20190401 | 0.88 |
2 | 20190501 | 0.87 |
2 | 20190601 | 0.88 |
2 | 20190701 | 0.86 |
2 | 20190801 | 0.82 |
2 | 20190901 | 0.83 |
2 | 20191001 | 0.84 |
What I would like to add is an column who looks at the percentage of the past 6 months where the percentage is consecutively is higher then 0.80. If yes, then it gets an 1 else 0.
Outcome
PersonalNumber | Dim_Date | PercentageCategory | NEWCOLUMN |
---|---|---|---|
1 | 20190101 | 0.81 | 0 |
1 | 20190101 | 0.81 | 0 |
1 | 20190101 | 0.81 | 0 |
1 | 20190101 | 0.81 | 0 |
1 | 20190102 | NULL | 0 |
1 | 20190103 | NULL | 0 |
1 | 20190201 | 0.81 | 0 |
1 | 20190301 | 0.86 | 0 |
1 | 20190401 | 0.88 | 0 |
1 | 20190501 | 0.87 | 0 |
1 | 20190601 | 0.88 | 0 |
1 | 20190701 | 0.66 | 1 |
1 | 20190701 | 0.66 | 1 |
1 | 20190701 | 0.66 | 1 |
1 | 20190701 | 0.66 | 1 |
1 | 20190702 | NULL | 0 |
1 | 20190801 | 0.82 | 0 |
1 | 20190901 | 0.83 | 0 |
1 | 20191001 | 0.84 | 0 |
2 | 20190101 | 0.81 | 0 |
2 | 20190201 | 0.81 | 0 |
2 | 20190301 | 0.86 | 0 |
2 | 20190401 | 0.88 | 0 |
2 | 20190501 | 0.87 | 0 |
2 | 20190601 | 0.88 | 0 |
2 | 20190701 | 0.86 | 1 |
2 | 20190801 | 0.82 | 1 |
2 | 20190901 | 0.83 | 1 |
2 | 20191001 | 0.84 | 1 |
In my example july 0f 2019 gets an 1 for the client number 1, because the past 6 months every month the percentage was higher then 0.80. For august it gets an 0, because july had an percentage lower than 0.80.
I couldn't find any information on stackoverflow with people who had a similiar problem or I couldn't convert the solution from them to my example. If you guys could help me, that would be gladly appreciated.
CodePudding user response:
CREATE TABLE Junk (
PersonalNumber int NOT NULL,
DimDate date NOT NULL,
PercentCategory float NOT NULL,
CONSTRAINT Junk_PK PRIMARY KEY CLUSTERED (PersonalNumber, DimDate)
)
SELECT j.PersonalNumber, j.DimDate, j.PercentCategory,
CASE
WHEN m.PercentCategoryMin >= 0.8 AND m.MinDate <= DATEADD(MONTH, -6, j.DimDate)
THEN 1
ELSE 0
END AS NewThing
FROM Junk j
INNER JOIN (
SELECT j.PersonalNumber, j.DimDate, MIN(k.DimDate) AS MinDate, MIN(k.PercentCategory) AS PercentCategoryMin
FROM Junk j
INNER JOIN Junk k
ON j.PersonalNumber = k.PersonalNumber
AND k.DimDate < j.DimDate
AND k.DimDate >= DATEADD(MONTH, -6, j.DimDate)
GROUP BY j.PersonalNumber, j.DimDate
) AS m
ON j.PersonalNumber = m.PersonalNumber AND j.DimDate = m.DimDate
CodePudding user response:
Put a bunch of lag
s into a CTE, select only rows that don't have a null PercentageCategory
, and then just a case
expression:
with months as
(
select t.*
, lag(PercentageCategory, 1) over (partition by PersonalNumber order by Dim_Date) m1
, lag(PercentageCategory, 2) over (partition by PersonalNumber order by Dim_Date) m2
, lag(PercentageCategory, 3) over (partition by PersonalNumber order by Dim_Date) m3
, lag(PercentageCategory, 4) over (partition by PersonalNumber order by Dim_Date) m4
, lag(PercentageCategory, 5) over (partition by PersonalNumber order by Dim_Date) m5
, lag(PercentageCategory, 6) over (partition by PersonalNumber order by Dim_Date) m6
from tbl t
where t.PercentageCategory is not null
)
select PersonalNumber
, Dim_Date
, PercentageCategory
, case when m1 > 0.8 and m2 > 0.8 and m3 > 0.8 and m4 > 0.8 and m5 > 0.8 and m6 > 0.8 then 1 else 0 end NEWCOLUMN
from months
CodePudding user response:
My solution assumes that any date having the day part > 1 have NULLs in the percentage column. Also my result does not create new duplicate columns as in your example where July occurs 4 times in the result but only once in the input data.
I called your table t
:
SELECT
PersonalNumber,
Dim_Date,
PercentageCategory,
CASE WHEN (SELECT COUNT(*)
FROM
(SELECT DISTINCT * FROM t) b
WHERE
b.PersonalNumber = a.PersonalNumber AND
b.Dim_Date >= DateAdd(m, -6, a.Dim_Date) AND b.Dim_Date < a.Dim_Date AND
b.PercentageCategory > 0.8
) = 6 THEN 1 ELSE 0 END AS NewColumn
FROM t a
See: http://sqlfiddle.com/#!18/796ab/7
CodePudding user response:
You can use a bunch of LAG
window functions. Note that there are no joins here, just a single scan of the base table
SELECT *,
NewColumn =
CASE
WHEN LAG(t.PercentageCategory, 1) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
AND LAG(t.PercentageCategory, 2) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
AND LAG(t.PercentageCategory, 3) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
AND LAG(t.PercentageCategory, 4) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
AND LAG(t.PercentageCategory, 5) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
AND LAG(t.PercentageCategory, 6) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
THEN 1
ELSE 0 END
FROM YourTable t
WHERE t.PercentageCategory IS NOT NULL;
This assumes, as you say, that there are no gaps or duplicates between rows which have a not-null PercentageCategory
, otherwise you need to group up the rows, or use a top-1 solution
SELECT *,
NewColumn =
CASE
WHEN LAG(t.PercentageCategory, 1) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
AND LAG(t.PercentageCategory, 2) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
AND LAG(t.PercentageCategory, 3) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
AND LAG(t.PercentageCategory, 4) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
AND LAG(t.PercentageCategory, 5) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
AND LAG(t.PercentageCategory, 6) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
THEN 1
ELSE 0 END
FROM (
SELECT
t.PersonalNumber,
t.Dim_Date,
PercentageCategory = MIN(t.PercentageCategory)
FROM YourTable t
WHERE t.PercentageCategory IS NOT NULL
GROUP BY
t.PersonalNumber,
t.Dim_Date
) t;