Home > other >  Re-Organize Data Output/Format in SQL
Re-Organize Data Output/Format in SQL

Time:07-17

I'm trying to re-organize the format of data output.

Here is the original data output I can generate with SQL query as below:

enter image description here

Company Line    Customer    Start date  spending
A       s1      Tom         2/1/2021    $10.00
A       s1      Bill        2/1/2021    $50.00
A       s1      Tom         2/2/2021    $20.00
A       s3      Tom         2/3/2021    $6.00
B       s1      Tom         2/4/2021    $20.00
B       s4      Bill        2/5/2021    $70.00
C       s2      Ben         2/7/2021    $80.00
A       s2      Tom         3/5/2021    $5.00
A       s2      Tom         3/6/2021    $5.00

SQL code I use:

select 
D.Company
, D.Line
, D.Customer
, D.Start_data
, D.spending
from Provider_db D

Here is the format of output I expect:

enter image description here

Company Line    Customer    Start date  End date    Duration Total_spending
A       s1      Tom         2/1/2021    2/2/2021    2            =10 20=26
A       s1      Bill        2/1/2021    2/1/2021    1             50
A       s3      Tom         2/3/2021    2/3/2021    1             6
A       s2      Tom         3/5/2021    3/6/2021    2             =5 5=10
B       s1      Tom         2/5/2021    2/5/2021    1             70
B       s4      Bill        2/5/2021    2/5/2021    1             70
C       s2      Ben         2/7/2021    2/7/2021    1             80

The criteria to generate the expected output is:

  1. Considering product line types the customers take

  2. "End date" is determined by the last "start date" that the same customer (e.g. Tom) has continue same product line (e.g. s1) with same company (e.g. A). "continues" means the interval between two "start date" for same product line and same company must be 1 day (e.g. 2/1-2/2). if a customer has only one record for a company (e.g. Bill at 2/1 with company A), its "end date" will be its "start date".

  3. "Total spending" is the sum of those records that meet the above conditions.

  4. "Duration" is ((End date)-(start date) 1).

Thank you very much in advance!

CodePudding user response:

Basically you want to group by Customer with same line as a group and then calculate the Duration and Total Spending.

First cte use lead() to find when there is a change in Line.

Second cte calculate a cumulative sum() to form the group

Final query just do a GROUP BY and calculate the required Duration & Total Spending

with
cte as
(
    select *, 
           g = case when Line = lead(Line) over (partition by Company, Customer 
                                                     order by StartDate)
                    then 1
                    else 0
                    end
    from   tbl
),
cte2 as
(
    select *, grp = sum(g) over (partition by Company, Customer 
                                     order by StartDate)
    from   cte
)
select Company, Line, Customer,
       StartDate = min(StartDate),
       EndDate   = max(StartDate),
       Duration  = datediff(day, min(StartDate), max(StartDate))   1,
       TotalSpending = sum(Spending)
from   cte2
group by Company, Line, Customer, grp
order by Company, Line, Customer

db<>fiddle demo

CodePudding user response:

Thanks squirrel!

I tried your solution. Looks some cases work but some not. Here is the case that doesn't work. Can you please take it look?

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=7612fc274ad54cb3a7b9eb81a1198bbe

create table tbl
(
  Company   char,
  Line      char(2),
  Customer  varchar(5),
  StartDate date,
  Spending  decimal(10,2)
);

insert into tbl values 
('A',       's1',      'Tom',         '20210201',    10.00),
('A',       's1',      'Tom',         '20210202',    10.00),
('A',       's1',      'Tom',         '20210203',    10.00),
('A',       's1',      'Tom',         '20210204',    10.00),
('A',       's1',      'Tom',         '20210206',    10.00),
('A',       's1',      'Tom',         '20210201',    10.00),
('A',       's1',      'Tom',         '20210207',    10.00),
('A',       's1',      'Tom',         '20210201',    10.00);

with
cte as
(
    select *, 
           g = case when Line = lead(Line) over (partition by Company, Customer 
                                                     order by StartDate)
                    then 1
                    else 0
                    end
    from   tbl
),
cte2 as
(
    select *, grp = sum(g) over (partition by Company, Customer 
                                     order by StartDate)
    from   cte
)
select Company, Line, Customer,
       StartDate = min(StartDate),
       EndDate = max(StartDate),
       Duration = datediff(day, min(StartDate), max(StartDate))   1,
       TotalSpending = sum(spending)
from   cte2
group by Company, Line, Customer, grp
order by Company, Line, Customer

Outputs:

Company Line    Customer    StartDate   EndDate    Duration TotalSpending
A        s1      Tom       2021-02-01   2021-02-01  1      30.00
A        s1      Tom       2021-02-02   2021-02-02  1      10.00
A        s1      Tom       2021-02-03   2021-02-03  1      10.00
A        s1      Tom       2021-02-04   2021-02-04  1      10.00
A        s1      Tom       2021-02-06   2021-02-07  2      20.00
  • Related