How to assign a primary value from Column B based on hierarchy of criteria across multiple columns (-CodePudding

in the example data set above, I want to assign a Primary_Store for each buyer_id based on the following criteria

if one store has more visit counts than any other store for that buyer, assign as primary
if no, then ---- > if stores with most visits are equal, assign Primary_Store according to hierarchy of store_type: 1. Retail, 2.Online, 3.Event ------------> if stores with most visits are equal, and store_type is the same, then assign Primary_Store as store with earliest visit date

By this logic the results should be: Buyer ID : Primary_Store 1001: Whole Foods 1002: Farmer's Market 1003: Costco

I have been trying to use a case when statement but i am having difficulty applying the different types of criteria at once.

CodePudding user response：

So the simple way is to use a RANKING window function of which ROW_NUMBER is one, and it does not allocate duplicate rank values.

thus:

SELECT 
    *
    ,decode(store_type, 'Retail', 1, 'Online', 2, 'Event', 3) as rnk2
    ,row_number() over (partition by buyer_id order by visit_count desc, rnk2, first_visit) as rn
FROM VALUES
    (1001, 'Whole Foods', '2020-05-31'::date, 'Retail', 10),
    (1001, 'Instacart', '2020-03-15'::date, 'Online', 10),
    (1001, 'Farmer"s Market', '2020-01-24'::date, 'Event', 4),
    (1002, 'Trader Joes', '2020-02-12'::date, 'Retail', 9)
    
    t(buyer_id, store, first_visit, store_type, visit_count)

for this partial copy of your data, we can use three priorities in the ranking ORDER BY

BUYER_ID	STORE	FIRST_VISIT	STORE_TYPE	VISIT_COUNT	RNK2	RN
1001	Whole Foods	2020-05-31	Retail	10	1	1
1001	Instacart	2020-03-15	Online	10	2	2
1001	Farmer"s Market	2020-01-24	Event	4	3	3
1002	Trader Joes	2020-02-12	Retail	9	1	1

thus it can be moved to a QUALIFY like and we can take only the first value:

SELECT 
    *
FROM VALUES
    (1001, 'Whole Foods', '2020-05-31'::date, 'Retail', 10),
    (1001, 'Instacart', '2020-03-15'::date, 'Online', 10),
    (1001, 'Farmer"s Market', '2020-01-24'::date, 'Event', 4),
    (1002, 'Trader Joes', '2020-02-12'::date, 'Retail', 9)
    
    t(buyer_id, store, first_visit, store_type, visit_count)
qualify row_number() over (
        partition by buyer_id 
        order by visit_count desc, 
            decode(store_type, 'Retail', 1, 'Online', 2, 'Event', 3), 
            first_visit) = 1
    ;

gives:

BUYER_ID	STORE	FIRST_VISIT	STORE_TYPE	VISIT_COUNT
1001	Whole Foods	2020-05-31	Retail	10
1002	Trader Joes	2020-02-12	Retail	9

So to reflect Rajat's answer of using FIRST_VALUE to retrieve the "primary store" as a value, which opens the idea, that you might also just want to know "is this the one". which can be done with:

SELECT 
    *
    ,iff(row_number() over (
        partition by buyer_id 
        order by visit_count desc, 
            decode(store_type, 'Retail', 1, 'Online', 2, 'Event', 3), 
            first_visit) = 1, 'yes', 'no' ) as primary_store
FROM your_table

BUYER_ID	STORE	FIRST_VISIT	STORE_TYPE	VISIT_COUNT	PRIMARY_STORE
1001	Whole Foods	2020-05-31	Retail	10	yes
1001	Instacart	2020-03-15	Online	10	no
1001	Farmer"s Market	2020-01-24	Event	4	no
1002	Trader Joes	2020-02-12	Retail	9	yes

the basic answer, of how can you rank the data, is the same between the methods, it then comes down to how you wish to use that rank.

CodePudding user response：

You seem to be looking for way to assign a primary store type to existing rows without collapsing them. We can modify Simeon's solution a little bit. I swapped decode out for array_position for personal preference but you can go with either of the methods

select *, first_value(store_type) over 
         (partition by buyer_id 
          order by visit_count desc, 
                   array_position(store_type::variant,['Retail','Online', 'Event']), 
                   first_visit asc) as primary_store
from your_table;