Home > Software design >  create additional date after and before current row and create new column based on it
create additional date after and before current row and create new column based on it

Time:01-23

Lets say I have this kind of data

create table example
(cust_id VARCHAR, product VARCHAR, price float, datetime varchar);

insert into example (cust_id, product, price, datetime)
VALUES
('1', 'scooter', 2000, '2022-01-10'),
('1', 'skateboard', 1500, '2022-01-20'),
('1', 'beefmeat', 300, '2022-06-08'),
('2', 'wallet', 200, '2022-02-25'),
('2', 'hairdryer', 250, '2022-04-28'),
('3', 'skateboard', 1600, '2022-03-29')

I want to make some kind of additional rows, and after that make new column based on this additional rows

My expectation output will like this

cust_id total_price date is_active
1 3500 2022-01 active
1 0 2022-02 active
1 0 2022-03 active
1 0 2022-04 inactive
1 0 2022-05 inactive
1 300 2022-06 active
1 0 2022-07 active
2 0 2022-01 inactive
2 200 2022-02 active
2 0 2022-03 active
2 250 2022-04 active
2 0 2022-05 active
2 0 2022-06 active
2 0 2022-07 inactive
3 0 2022-01 inactive
3 0 2022-02 inactive
3 1600 2022-03 active
3 0 2022-04 active
3 0 2022-05 active
3 0 2022-06 inactive
3 0 2022-07 inactive

the rules is like this

  1. the first month when the customer make transaction is called active, before this transaction called inactive.
    ex: first transaction in month 2, then month 2 is active, month 1 is inactive (look cust_id 2 and 3)
  2. if more than 2 months there isnt transaction, the next month is inactive until there is new transaction is active.
    ex: if last transaction in month 1, then month 2 and month 3 is inactive, and month 4, month 5 inactive if month 6 there is new transaction (look cust_id 1 and 3)

well my first thought is used this code, but I dont know what the next step after it

select *, 
    date_part('month', age(to_date(date, 'YYYY-MM'), to_date(lag(date) over (partition by cust_id order by date),'YYYY-MM')))date_diff
from(
    select 
        cust_id,
        sum(price)total_price,
        to_char(to_date(datetime, 'YYYY-MM-DD'),'YYYY-MM')date
    from example
    group BY
        cust_id,
        date
    order by 
    cust_id, 
    date)test

I'm open to any suggestion

CodePudding user response:

Try the following, an explanation within query comments:

/* use generate_series to generate a series of dates
   starting from the min date of datetime up to the 
   max datetime with one-month intervals, then do a 
   cross join with the distinct cust_id to map each cust_id 
   to each generated date.*/
WITH cust_dates AS
(
  SELECT EX.cust_id, to_char(dts, 'YYYY-mm') dts
  FROM generate_series 
        (
         (SELECT MIN(datetime)::timestamp FROM example), 
         (SELECT MAX(datetime)::timestamp   '2 month'::interval  FROM example),
         '1 month'::interval
        ) dts
  CROSS JOIN (SELECT DISTINCT cust_id FROM example) EX
),
/* do a left join with your table to find prices
   for each cust_id/ month, and aggregate for cust_id, month_date
   to find the sum of prices for each cust_id, month_date.
*/
monthly_price AS
(
  SELECT CD.cust_id,
       CD.dts AS month_date,
       COALESCE(SUM(price), 0) total_price
  FROM cust_dates CD LEFT JOIN example EX
  ON CD.cust_id = EX.cust_id AND 
     CD.dts = to_char(EX.datetime, 'YYYY-mm')
  GROUP BY CD.cust_id, CD.dts
)
/* Now, we have the sum of monthly prices for each cust_id,
   we can use the max window function with "ROWS BETWEEN 2 PRECEDING AND CURRENT ROW"
   to check if one of the (current month or the previous two months) has a sum of prices > 0.
*/
SELECT cust_id, month_date, total_price,
  CASE MAX(total_price) OVER 
      (PARTITION BY cust_id ORDER BY month_date 
       ROWS BETWEEN 2 PRECEDING AND CURRENT ROW)
    WHEN 0 THEN 'inactive' 
    ELSE 'active'
  END AS is_active
FROM monthly_price
ORDER BY cust_id, month_date

See demo

  • Related