Home > Software engineering >  How to assign a primary value from Column B based on hierarchy of criteria across multiple columns (
How to assign a primary value from Column B based on hierarchy of criteria across multiple columns (

Time:11-29

enter image description here

in the example data set above, I want to assign a Primary_Store for each buyer_id based on the following criteria

  • if one store has more visit counts than any other store for that buyer, assign as primary
  • if no, then ---- > if stores with most visits are equal, assign Primary_Store according to hierarchy of store_type: 1. Retail, 2.Online, 3.Event ------------> if stores with most visits are equal, and store_type is the same, then assign Primary_Store as store with earliest visit date

By this logic the results should be: Buyer ID : Primary_Store 1001: Whole Foods 1002: Farmer's Market 1003: Costco

I have been trying to use a case when statement but i am having difficulty applying the different types of criteria at once.

CodePudding user response:

So the simple way is to use a RANKING window function of which ROW_NUMBER is one, and it does not allocate duplicate rank values.

thus:

SELECT 
    *
    ,decode(store_type, 'Retail', 1, 'Online', 2, 'Event', 3) as rnk2
    ,row_number() over (partition by buyer_id order by visit_count desc, rnk2, first_visit) as rn
FROM VALUES
    (1001, 'Whole Foods', '2020-05-31'::date, 'Retail', 10),
    (1001, 'Instacart', '2020-03-15'::date, 'Online', 10),
    (1001, 'Farmer"s Market', '2020-01-24'::date, 'Event', 4),
    (1002, 'Trader Joes', '2020-02-12'::date, 'Retail', 9)
    
    t(buyer_id, store, first_visit, store_type, visit_count)

for this partial copy of your data, we can use three priorities in the ranking ORDER BY

BUYER_ID STORE FIRST_VISIT STORE_TYPE VISIT_COUNT RNK2 RN
1001 Whole Foods 2020-05-31 Retail 10 1 1
1001 Instacart 2020-03-15 Online 10 2 2
1001 Farmer"s Market 2020-01-24 Event 4 3 3
1002 Trader Joes 2020-02-12 Retail 9 1 1

thus it can be moved to a QUALIFY like and we can take only the first value:

SELECT 
    *
FROM VALUES
    (1001, 'Whole Foods', '2020-05-31'::date, 'Retail', 10),
    (1001, 'Instacart', '2020-03-15'::date, 'Online', 10),
    (1001, 'Farmer"s Market', '2020-01-24'::date, 'Event', 4),
    (1002, 'Trader Joes', '2020-02-12'::date, 'Retail', 9)
    
    t(buyer_id, store, first_visit, store_type, visit_count)
qualify row_number() over (
        partition by buyer_id 
        order by visit_count desc, 
            decode(store_type, 'Retail', 1, 'Online', 2, 'Event', 3), 
            first_visit) = 1
    ;

gives:

BUYER_ID STORE FIRST_VISIT STORE_TYPE VISIT_COUNT
1001 Whole Foods 2020-05-31 Retail 10
1002 Trader Joes 2020-02-12 Retail 9

So to reflect Rajat's answer of using FIRST_VALUE to retrieve the "primary store" as a value, which opens the idea, that you might also just want to know "is this the one". which can be done with:

SELECT 
    *
    ,iff(row_number() over (
        partition by buyer_id 
        order by visit_count desc, 
            decode(store_type, 'Retail', 1, 'Online', 2, 'Event', 3), 
            first_visit) = 1, 'yes', 'no' ) as primary_store
FROM your_table
BUYER_ID STORE FIRST_VISIT STORE_TYPE VISIT_COUNT PRIMARY_STORE
1001 Whole Foods 2020-05-31 Retail 10 yes
1001 Instacart 2020-03-15 Online 10 no
1001 Farmer"s Market 2020-01-24 Event 4 no
1002 Trader Joes 2020-02-12 Retail 9 yes

the basic answer, of how can you rank the data, is the same between the methods, it then comes down to how you wish to use that rank.

CodePudding user response:

You seem to be looking for way to assign a primary store type to existing rows without collapsing them. We can modify Simeon's solution a little bit. I swapped decode out for array_position for personal preference but you can go with either of the methods

select *, first_value(store_type) over 
         (partition by buyer_id 
          order by visit_count desc, 
                   array_position(store_type::variant,['Retail','Online', 'Event']), 
                   first_visit asc) as primary_store
from your_table;
  • Related