Home > Software design >  SQL Server : finding distribution of shipping modes for each state
SQL Server : finding distribution of shipping modes for each state

Time:06-18

Sorry if the question doesn't make any sense. I'm practicing on a sample sales data sheet and want to find distribution of shipping modes for each state. The following code finds distribution for shipping modes on entire sheet:

SELECT 
    ship_mode,
    ROUND(CAST(COUNT(ship_mode) * 100 AS float) / (SELECT CAST(COUNT(*) AS float) FROM sales_data_clean), 2) AS [Percentage]
FROM 
    sales_data_clean
GROUP BY 
    ship_mode
ORDER BY 
    ship_mode

Output:

First Class     15.32
Same Day         5.49
Second Class    19.41
Standard Class  59.79

Now I want to find the distribution for each state, so sample output should look something like this (using random percentage values):

State   Ship_mode       Percentage
----------------------------------
Alabama First Class     15.0
Alabama Same Day        15.0
Alabama Second Class    30.0
Alabama Standard Class  40.0
Arizona First Class     25.0
Arizona Same Day        25.0
Arizona Second Class    25.0
Arizona Standard Class  25.0

whereas each state's ship modes combined should add up to 100. Right now I've tried this code:

SELECT state, ship_mode,
       round(cast(count(ship_mode) * 100 as float) /(SELECT cast(count(*) as float) from sales_data_clean), 2) as [Percentage]
FROM sales_data_clean
GROUP BY ship_mode, state
ORDER BY state,ship_mode

Alabama First Class     0.09
Alabama Same Day        0.01
Alabama Second Class    0.18
Alabama Standard Class  0.34
Arizona First Class     0.43
Arizona Same Day        0.16
Arizona Second Class    0.28
Arizona Standard Class  1.41

As you can see, the percentage column reflects each ship modes count divided by grand total of rows, instead of being divided by each state.

Thank you in advance

Bonus - is there a way to optimize this line of code?

ROUND(CAST(COUNT(ship_mode) * 100 AS float) / (SELECT CAST(COUNT(*) AS float) FROM sales_data_clean), 2)

CodePudding user response:

The general calculation isn't too hard. Getting the total to be exactly 100 is trickier. Try something like this:

with data as (
    SELECT state, ship_mode,
        cast(round(count(*) * 100e / sum(count(*)) over (), 2) as decimal(5, 2)) as pct,
        row_number() over (order by count(*), state, ship_mode) as rn1,
        row_number() over (order by count(*) desc, state desc, ship_mode desc) as rn2
    FROM sales_data_clean
    GROUP BY state, ship_mode
)
select state, ship_mode, sum(pct) over (), pct,
    case
        when sum(pct) over () < 100 and rn2 = 1 then pct   100 - sum(pct) over ()
        when sum(pct) over () > 100 and rn1 = 1 then pct - 100   sum(pct) over ()
        else pct
    end as pct
from data
order by state, ship_mode;

https://dbfiddle.uk/?rdbms=sqlserver_2019&fiddle=d31b17537e46f86f9fe3ea79ccc4c650

The idea of distributing the rounding errors to work out to neat total is not entirely trivial.

CodePudding user response:

You're in the right direction, but I think you want to divide by the SUM of the counts for each state. There are a few ways to obtain that. Here's one:

SELECT State, Ship_mode
     , CONVERT(DECIMAL(12,3), (COUNT(*) * 100.0) / SUM(COUNT(*)) OVER (PARTITION BY State)) AS pct
  FROM sales_data_clean
 GROUP BY state, ship_mode
;

The result:

State Ship_mode pct
Alabama First Class 10.000
Alabama Same Day 20.000
Alabama Second Class 30.000
Alabama Standard Class 40.000
Arizona First Class 16.667
Arizona Same Day 16.667
Arizona Second Class 33.333
Arizona Standard Class 33.333

As already noted, distributing rounding error can be tricky. While this doesn't always total to exactly 100, given just any data, it's fairly close as a first try. With this test data, we happen to be lucky.

Thanks to shawnt00 for the initial fiddle. I've adjusted the data slightly.

The Test Case:

CREATE TABLE sales_data_clean (
    State     VARCHAR(20)
  , Ship_mode VARCHAR(20)
);

INSERT INTO sales_data_clean (State, Ship_mode) VALUES
   ('Alabama', 'First Class')
 , ('Alabama', 'Second Class')
 , ('Alabama', 'Second Class')
 , ('Alabama', 'Second Class')
 , ('Alabama', 'Standard Class')
 , ('Alabama', 'Standard Class')
 , ('Alabama', 'Standard Class')
 , ('Alabama', 'Standard Class')
 , ('Alabama', 'Same Day')
 , ('Alabama', 'Same Day')
 , ('Arizona', 'First Class')
 , ('Arizona', 'Second Class')
 , ('Arizona', 'Second Class')
 , ('Arizona', 'Standard Class')
 , ('Arizona', 'Standard Class')
 , ('Arizona', 'Same Day')
;
  • Related