Home > OS >  How to get the first subscription for each user (given that subscription ids change every time it re
How to get the first subscription for each user (given that subscription ids change every time it re

Time:10-06

Okay so I am in a bit of a pickle here. I have user_id, subscription_id, plan, subscription_start_date, subscription_end_date. I am looking for all the different plans that each user has purchased and the corresponding subscription_ids with only one id per plan(the very first one). The caveat is that subscription ids change every time a user renews a subscription. So lets say user A has only one subscription and has renewed it 3 times hence has 3 different subscription ids, and user B has 2 plans and has renewed it twice so they have 4 subscription ids.

I am looking for user A to have 1 sub_id and 1 plan and user B have 2 sub ids and 2 different plans

Here is my query so far

SELECT H.plan, H.user_id
FROM my_table H
INNER JOIN
    (SELECT user_id, plan, MIN(subscription_purchase_date) As first_sub_date
    FROM my_table
    GROUP BY user_id, plan) X
ON H.user_id= X.user_id AND H.subscription_purchase_date = X.first_sub

user_id subscription_id start_date end_date plan
A 123 2021-01-01 9999-01-01 Premium
B 122 2021-02-03 9999-03-04 Regular
A 144 2021-02-01 9999-01-01 Premium
A 155 2021-03-01 9999-01-01 Premium
B 167 2021-03-03 9999-03-04 Regular
B 111 2020-05-18 2021-12-18 Trial
B 187 2020-06-18 2021-12-18 Trial

Desired outcome

user_id subscription_id start_date end_date plan
A 123 2021-01-01 9999-01-01 Premium
B 122 2021-02-03 9999-03-04 Regular
B 111 2020-05-18 2021-12-18 Trial

Thanks so much and let me know if you need additional info PS I am using Hive/Hadoop

CodePudding user response:

Use row_number() and filter.

Demo with your data example:

with my_table as (--data example, use your table instead of this CTE            
select 'A' user_id, 123 subscription_id, '2021-01-01' start_date, '9999-01-01' end_date, 'Premium' plan union all
select 'B', 122, '2021-02-03', '9999-03-04', 'Regular' union all
select 'A', 144, '2021-02-01', '9999-01-01', 'Premium' union all
select 'A', 155, '2021-03-01', '9999-01-01', 'Premium' union all
select 'B', 167, '2021-03-03', '9999-03-04', 'Regular' union all
select 'B', 111, '2020-05-18', '2021-12-18', 'Trial' union all
select 'B', 187, '2020-06-18', '2021-12-18', 'Trial'
)

select user_id, subscription_id, start_date, end_date, plan
from 
(
select user_id, subscription_id, start_date, end_date, plan,
       --Row with min start_date will be assigned rn=1
       row_number() over(partition by user_id, plan order by start_date) rn
from my_table
)s where rn=1

Result:

user_id subscription_id start_date  end_date    plan
A       123             2021-01-01  9999-01-01  Premium
B       122             2021-02-03  9999-03-04  Regular
B       111             2020-05-18  2021-12-18  Trial
  • Related