Okay so I am in a bit of a pickle here. I have user_id, subscription_id, plan, subscription_start_date, subscription_end_date. I am looking for all the different plans that each user has purchased and the corresponding subscription_ids with only one id per plan(the very first one). The caveat is that subscription ids change every time a user renews a subscription. So lets say user A has only one subscription and has renewed it 3 times hence has 3 different subscription ids, and user B has 2 plans and has renewed it twice so they have 4 subscription ids.
I am looking for user A to have 1 sub_id and 1 plan and user B have 2 sub ids and 2 different plans
Here is my query so far
SELECT H.plan, H.user_id
FROM my_table H
INNER JOIN
(SELECT user_id, plan, MIN(subscription_purchase_date) As first_sub_date
FROM my_table
GROUP BY user_id, plan) X
ON H.user_id= X.user_id AND H.subscription_purchase_date = X.first_sub
user_id | subscription_id | start_date | end_date | plan |
---|---|---|---|---|
A | 123 | 2021-01-01 | 9999-01-01 | Premium |
B | 122 | 2021-02-03 | 9999-03-04 | Regular |
A | 144 | 2021-02-01 | 9999-01-01 | Premium |
A | 155 | 2021-03-01 | 9999-01-01 | Premium |
B | 167 | 2021-03-03 | 9999-03-04 | Regular |
B | 111 | 2020-05-18 | 2021-12-18 | Trial |
B | 187 | 2020-06-18 | 2021-12-18 | Trial |
Desired outcome
user_id | subscription_id | start_date | end_date | plan |
---|---|---|---|---|
A | 123 | 2021-01-01 | 9999-01-01 | Premium |
B | 122 | 2021-02-03 | 9999-03-04 | Regular |
B | 111 | 2020-05-18 | 2021-12-18 | Trial |
Thanks so much and let me know if you need additional info PS I am using Hive/Hadoop
CodePudding user response:
Use row_number() and filter.
Demo with your data example:
with my_table as (--data example, use your table instead of this CTE
select 'A' user_id, 123 subscription_id, '2021-01-01' start_date, '9999-01-01' end_date, 'Premium' plan union all
select 'B', 122, '2021-02-03', '9999-03-04', 'Regular' union all
select 'A', 144, '2021-02-01', '9999-01-01', 'Premium' union all
select 'A', 155, '2021-03-01', '9999-01-01', 'Premium' union all
select 'B', 167, '2021-03-03', '9999-03-04', 'Regular' union all
select 'B', 111, '2020-05-18', '2021-12-18', 'Trial' union all
select 'B', 187, '2020-06-18', '2021-12-18', 'Trial'
)
select user_id, subscription_id, start_date, end_date, plan
from
(
select user_id, subscription_id, start_date, end_date, plan,
--Row with min start_date will be assigned rn=1
row_number() over(partition by user_id, plan order by start_date) rn
from my_table
)s where rn=1
Result:
user_id subscription_id start_date end_date plan
A 123 2021-01-01 9999-01-01 Premium
B 122 2021-02-03 9999-03-04 Regular
B 111 2020-05-18 2021-12-18 Trial