Partition by order of partition columns-CodePudding

I have this window function appled that looks like this:

SUM(value) OVER (PARTITION BY product, service, site ORDER BY region, site, service, product, year, week ASC ROWS BETWEEN 12 PRECEDING AND 0 PRECEDEING) AS value

Querry is working fine but I want to understand more of the window function, I have two questions:

Does the partition columns order matters (PARTITION BY product, service, site)?
Do I need to specify columns from point 1 in ORDER BY clause or can I ommit them?

CodePudding user response：

Does the partition columns order matter?

No.

It makes no difference whether you have PARTITION BY product, service, site or PARTITION BY site, service, product or any other ordering of those three columns.

The members of a partition will be the rows that share the same values for all three of the columns and it doesn't matter how you order them.

Do I need to specify columns from point 1 in ORDER BY clause or can I omit them?

You don't need to specify the columns.

Don't do

PARTITION BY product, service, site 
ORDER BY region, site, service, product, year, week

Just do

PARTITION BY product, service, site 
ORDER BY region, year, week

All of the rows within a partition will have the same values for all of product, service, site so it doesn't add anything to include these in the ordering.

The query is clearer with these "no-op" elements removed.

CodePudding user response：

The partition by clause specifies the granularity. In your case it is saying look at all the rows of a given product for a given service at a given site.
Since it is the granularity the order doesn't matter so you can have the clause

PARTITION BY product, service, site)
PARTITION BY service, site, product)
PARTITION BY site, product, service)

The Order by clause on the other hand is really important. You have to really think about how you want the function to operate. For example:

if you are using a row_number() dense_rank() rank functions:

row_number() over(partition by user order by sales desc) would generate a rank starting from 1 based on the granularity. In this case, the rank would start per user for a given sales value, but since it is desc the rank will start from higher to lower and vice versa. A similar thing will happen to all the above-mentioned functions.

First_value() operates the same, it fetches the first value of a given column, if you would like to get the user_id with lowest sales. You can get it using first_value() with order by ASC, but DESC would give you the id that has highest sales i.e equivalent to the answer of LAST_VALUE as an example, but if you put the order by clause as a desc, it will actually give you the last value

I would suggest to create a small data set and play with ORDER BY caluse for all the functions that you are interested in