I'm trying to re-organize the format of data output.
Here is the original data output I can generate with SQL query as below:
Company Line Customer Start date spending
A s1 Tom 2/1/2021 $10.00
A s1 Bill 2/1/2021 $50.00
A s1 Tom 2/2/2021 $20.00
A s3 Tom 2/3/2021 $6.00
B s1 Tom 2/4/2021 $20.00
B s4 Bill 2/5/2021 $70.00
C s2 Ben 2/7/2021 $80.00
A s2 Tom 3/5/2021 $5.00
A s2 Tom 3/6/2021 $5.00
SQL code I use:
select
D.Company
, D.Line
, D.Customer
, D.Start_data
, D.spending
from Provider_db D
Here is the format of output I expect:
Company Line Customer Start date End date Duration Total_spending
A s1 Tom 2/1/2021 2/2/2021 2 =10 20=26
A s1 Bill 2/1/2021 2/1/2021 1 50
A s3 Tom 2/3/2021 2/3/2021 1 6
A s2 Tom 3/5/2021 3/6/2021 2 =5 5=10
B s1 Tom 2/5/2021 2/5/2021 1 70
B s4 Bill 2/5/2021 2/5/2021 1 70
C s2 Ben 2/7/2021 2/7/2021 1 80
The criteria to generate the expected output is:
Considering product line types the customers take
"End date" is determined by the last "start date" that the same customer (e.g. Tom) has continue same product line (e.g. s1) with same company (e.g. A). "continues" means the interval between two "start date" for same product line and same company must be 1 day (e.g. 2/1-2/2). if a customer has only one record for a company (e.g. Bill at 2/1 with company A), its "end date" will be its "start date".
"Total spending" is the sum of those records that meet the above conditions.
"Duration" is ((End date)-(start date) 1).
Thank you very much in advance!
CodePudding user response:
Basically you want to group by Customer with same line as a group and then calculate the Duration and Total Spending.
First cte use lead() to find when there is a change in Line
.
Second cte calculate a cumulative sum()
to form the group
Final query just do a GROUP BY
and calculate the required Duration
& Total Spending
with
cte as
(
select *,
g = case when Line = lead(Line) over (partition by Company, Customer
order by StartDate)
then 1
else 0
end
from tbl
),
cte2 as
(
select *, grp = sum(g) over (partition by Company, Customer
order by StartDate)
from cte
)
select Company, Line, Customer,
StartDate = min(StartDate),
EndDate = max(StartDate),
Duration = datediff(day, min(StartDate), max(StartDate)) 1,
TotalSpending = sum(Spending)
from cte2
group by Company, Line, Customer, grp
order by Company, Line, Customer
CodePudding user response:
Thanks squirrel!
I tried your solution. Looks some cases work but some not. Here is the case that doesn't work. Can you please take it look?
https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=7612fc274ad54cb3a7b9eb81a1198bbe
create table tbl
(
Company char,
Line char(2),
Customer varchar(5),
StartDate date,
Spending decimal(10,2)
);
insert into tbl values
('A', 's1', 'Tom', '20210201', 10.00),
('A', 's1', 'Tom', '20210202', 10.00),
('A', 's1', 'Tom', '20210203', 10.00),
('A', 's1', 'Tom', '20210204', 10.00),
('A', 's1', 'Tom', '20210206', 10.00),
('A', 's1', 'Tom', '20210201', 10.00),
('A', 's1', 'Tom', '20210207', 10.00),
('A', 's1', 'Tom', '20210201', 10.00);
with
cte as
(
select *,
g = case when Line = lead(Line) over (partition by Company, Customer
order by StartDate)
then 1
else 0
end
from tbl
),
cte2 as
(
select *, grp = sum(g) over (partition by Company, Customer
order by StartDate)
from cte
)
select Company, Line, Customer,
StartDate = min(StartDate),
EndDate = max(StartDate),
Duration = datediff(day, min(StartDate), max(StartDate)) 1,
TotalSpending = sum(spending)
from cte2
group by Company, Line, Customer, grp
order by Company, Line, Customer
Outputs:
Company Line Customer StartDate EndDate Duration TotalSpending
A s1 Tom 2021-02-01 2021-02-01 1 30.00
A s1 Tom 2021-02-02 2021-02-02 1 10.00
A s1 Tom 2021-02-03 2021-02-03 1 10.00
A s1 Tom 2021-02-04 2021-02-04 1 10.00
A s1 Tom 2021-02-06 2021-02-07 2 20.00