Home > Back-end >  If value is higher then X for the past 6 months then 1 else 0
If value is higher then X for the past 6 months then 1 else 0

Time:09-03

I have a question. I got a Table with the following

PersonalNumber Dim_Date PercentageCategory
1 20190101 0.81
1 20190101 0.81
1 20190101 0.81
1 20190101 0.81
1 20190102 NULL
1 20190103 NULL
1 20190201 0.81
1 20190301 0.86
1 20190401 0.88
1 20190501 0.87
1 20190601 0.88
1 20190701 0.66
1 20190702 NULL
1 20190801 0.82
1 20190901 0.83
1 20191001 0.84
2 20190101 0.81
2 20190201 0.81
2 20190301 0.86
2 20190401 0.88
2 20190501 0.87
2 20190601 0.88
2 20190701 0.86
2 20190801 0.82
2 20190901 0.83
2 20191001 0.84

What I would like to add is an column who looks at the percentage of the past 6 months where the percentage is consecutively is higher then 0.80. If yes, then it gets an 1 else 0.

Outcome

PersonalNumber Dim_Date PercentageCategory NEWCOLUMN
1 20190101 0.81 0
1 20190101 0.81 0
1 20190101 0.81 0
1 20190101 0.81 0
1 20190102 NULL 0
1 20190103 NULL 0
1 20190201 0.81 0
1 20190301 0.86 0
1 20190401 0.88 0
1 20190501 0.87 0
1 20190601 0.88 0
1 20190701 0.66 1
1 20190701 0.66 1
1 20190701 0.66 1
1 20190701 0.66 1
1 20190702 NULL 0
1 20190801 0.82 0
1 20190901 0.83 0
1 20191001 0.84 0
2 20190101 0.81 0
2 20190201 0.81 0
2 20190301 0.86 0
2 20190401 0.88 0
2 20190501 0.87 0
2 20190601 0.88 0
2 20190701 0.86 1
2 20190801 0.82 1
2 20190901 0.83 1
2 20191001 0.84 1

In my example july 0f 2019 gets an 1 for the client number 1, because the past 6 months every month the percentage was higher then 0.80. For august it gets an 0, because july had an percentage lower than 0.80.

I couldn't find any information on stackoverflow with people who had a similiar problem or I couldn't convert the solution from them to my example. If you guys could help me, that would be gladly appreciated.

CodePudding user response:

CREATE TABLE Junk (
    PersonalNumber  int   NOT NULL,
    DimDate         date  NOT NULL,
    PercentCategory float NOT NULL,

    CONSTRAINT Junk_PK PRIMARY KEY CLUSTERED (PersonalNumber, DimDate)
)

SELECT j.PersonalNumber, j.DimDate, j.PercentCategory,
    CASE
        WHEN m.PercentCategoryMin >= 0.8 AND m.MinDate <= DATEADD(MONTH, -6, j.DimDate)
            THEN 1
        ELSE 0
    END AS NewThing
FROM Junk j
    INNER JOIN (
        SELECT j.PersonalNumber, j.DimDate, MIN(k.DimDate) AS MinDate, MIN(k.PercentCategory) AS PercentCategoryMin
        FROM Junk j
            INNER JOIN Junk k
                ON j.PersonalNumber = k.PersonalNumber
                AND k.DimDate < j.DimDate
                AND k.DimDate >= DATEADD(MONTH, -6, j.DimDate)
        GROUP BY j.PersonalNumber, j.DimDate
    ) AS m
        ON j.PersonalNumber = m.PersonalNumber AND j.DimDate = m.DimDate

CodePudding user response:

Put a bunch of lags into a CTE, select only rows that don't have a null PercentageCategory, and then just a case expression:

with months as
(
    select t.*
     , lag(PercentageCategory, 1) over (partition by PersonalNumber order by Dim_Date) m1
     , lag(PercentageCategory, 2) over (partition by PersonalNumber order by Dim_Date) m2
     , lag(PercentageCategory, 3) over (partition by PersonalNumber order by Dim_Date) m3
     , lag(PercentageCategory, 4) over (partition by PersonalNumber order by Dim_Date) m4
     , lag(PercentageCategory, 5) over (partition by PersonalNumber order by Dim_Date) m5
     , lag(PercentageCategory, 6) over (partition by PersonalNumber order by Dim_Date) m6
    from tbl t
    where t.PercentageCategory is not null
)
select PersonalNumber
    , Dim_Date
    , PercentageCategory
    , case when m1 > 0.8 and m2 > 0.8 and m3 > 0.8 and m4 > 0.8 and m5 > 0.8 and m6 > 0.8 then 1 else 0 end NEWCOLUMN 
from months

CodePudding user response:

My solution assumes that any date having the day part > 1 have NULLs in the percentage column. Also my result does not create new duplicate columns as in your example where July occurs 4 times in the result but only once in the input data.

I called your table t:

SELECT 
  PersonalNumber,
  Dim_Date,
  PercentageCategory,
  CASE WHEN (SELECT COUNT(*)
   FROM 
       (SELECT DISTINCT * FROM t) b
   WHERE
       b.PersonalNumber = a.PersonalNumber AND      
       b.Dim_Date >= DateAdd(m, -6, a.Dim_Date) AND b.Dim_Date < a.Dim_Date AND
       b.PercentageCategory > 0.8
  ) = 6 THEN 1 ELSE 0 END AS NewColumn
FROM t a

See: http://sqlfiddle.com/#!18/796ab/7

CodePudding user response:

You can use a bunch of LAG window functions. Note that there are no joins here, just a single scan of the base table

SELECT *,
  NewColumn =
     CASE
        WHEN LAG(t.PercentageCategory, 1) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
         AND LAG(t.PercentageCategory, 2) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
         AND LAG(t.PercentageCategory, 3) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
         AND LAG(t.PercentageCategory, 4) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
         AND LAG(t.PercentageCategory, 5) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
         AND LAG(t.PercentageCategory, 6) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
        THEN 1
        ELSE 0 END
FROM YourTable t
WHERE t.PercentageCategory IS NOT NULL;

db<>fiddle

This assumes, as you say, that there are no gaps or duplicates between rows which have a not-null PercentageCategory, otherwise you need to group up the rows, or use a top-1 solution

SELECT *,
  NewColumn =
     CASE
        WHEN LAG(t.PercentageCategory, 1) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
         AND LAG(t.PercentageCategory, 2) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
         AND LAG(t.PercentageCategory, 3) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
         AND LAG(t.PercentageCategory, 4) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
         AND LAG(t.PercentageCategory, 5) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
         AND LAG(t.PercentageCategory, 6) OVER (PARTITION BY t.PersonalNumber ORDER BY t.Dim_Date) > 0.8
        THEN 1
        ELSE 0 END
FROM (
    SELECT
      t.PersonalNumber,
      t.Dim_Date,
      PercentageCategory = MIN(t.PercentageCategory)
    FROM YourTable t
    WHERE t.PercentageCategory IS NOT NULL
    GROUP BY
      t.PersonalNumber,
      t.Dim_Date
) t;
  • Related