in the example data set above, I want to assign a Primary_Store for each buyer_id based on the following criteria
- if one store has more visit counts than any other store for that buyer, assign as primary
- if no, then ---- > if stores with most visits are equal, assign Primary_Store according to hierarchy of store_type: 1. Retail, 2.Online, 3.Event ------------> if stores with most visits are equal, and store_type is the same, then assign Primary_Store as store with earliest visit date
By this logic the results should be: Buyer ID : Primary_Store 1001: Whole Foods 1002: Farmer's Market 1003: Costco
I have been trying to use a case when statement but i am having difficulty applying the different types of criteria at once.
CodePudding user response:
So the simple way is to use a RANKING window function of which ROW_NUMBER is one, and it does not allocate duplicate rank values.
thus:
SELECT
*
,decode(store_type, 'Retail', 1, 'Online', 2, 'Event', 3) as rnk2
,row_number() over (partition by buyer_id order by visit_count desc, rnk2, first_visit) as rn
FROM VALUES
(1001, 'Whole Foods', '2020-05-31'::date, 'Retail', 10),
(1001, 'Instacart', '2020-03-15'::date, 'Online', 10),
(1001, 'Farmer"s Market', '2020-01-24'::date, 'Event', 4),
(1002, 'Trader Joes', '2020-02-12'::date, 'Retail', 9)
t(buyer_id, store, first_visit, store_type, visit_count)
for this partial copy of your data, we can use three priorities in the ranking ORDER BY
BUYER_ID | STORE | FIRST_VISIT | STORE_TYPE | VISIT_COUNT | RNK2 | RN |
---|---|---|---|---|---|---|
1001 | Whole Foods | 2020-05-31 | Retail | 10 | 1 | 1 |
1001 | Instacart | 2020-03-15 | Online | 10 | 2 | 2 |
1001 | Farmer"s Market | 2020-01-24 | Event | 4 | 3 | 3 |
1002 | Trader Joes | 2020-02-12 | Retail | 9 | 1 | 1 |
thus it can be moved to a QUALIFY like and we can take only the first value:
SELECT
*
FROM VALUES
(1001, 'Whole Foods', '2020-05-31'::date, 'Retail', 10),
(1001, 'Instacart', '2020-03-15'::date, 'Online', 10),
(1001, 'Farmer"s Market', '2020-01-24'::date, 'Event', 4),
(1002, 'Trader Joes', '2020-02-12'::date, 'Retail', 9)
t(buyer_id, store, first_visit, store_type, visit_count)
qualify row_number() over (
partition by buyer_id
order by visit_count desc,
decode(store_type, 'Retail', 1, 'Online', 2, 'Event', 3),
first_visit) = 1
;
gives:
BUYER_ID | STORE | FIRST_VISIT | STORE_TYPE | VISIT_COUNT |
---|---|---|---|---|
1001 | Whole Foods | 2020-05-31 | Retail | 10 |
1002 | Trader Joes | 2020-02-12 | Retail | 9 |
So to reflect Rajat's answer of using FIRST_VALUE to retrieve the "primary store" as a value, which opens the idea, that you might also just want to know "is this the one". which can be done with:
SELECT
*
,iff(row_number() over (
partition by buyer_id
order by visit_count desc,
decode(store_type, 'Retail', 1, 'Online', 2, 'Event', 3),
first_visit) = 1, 'yes', 'no' ) as primary_store
FROM your_table
BUYER_ID | STORE | FIRST_VISIT | STORE_TYPE | VISIT_COUNT | PRIMARY_STORE |
---|---|---|---|---|---|
1001 | Whole Foods | 2020-05-31 | Retail | 10 | yes |
1001 | Instacart | 2020-03-15 | Online | 10 | no |
1001 | Farmer"s Market | 2020-01-24 | Event | 4 | no |
1002 | Trader Joes | 2020-02-12 | Retail | 9 | yes |
the basic answer, of how can you rank the data, is the same between the methods, it then comes down to how you wish to use that rank.
CodePudding user response:
You seem to be looking for way to assign a primary store type to existing rows without collapsing them. We can modify Simeon's solution a little bit. I swapped decode
out for array_position
for personal preference but you can go with either of the methods
select *, first_value(store_type) over
(partition by buyer_id
order by visit_count desc,
array_position(store_type::variant,['Retail','Online', 'Event']),
first_visit asc) as primary_store
from your_table;