I have a table 'purchases' with 3 columns Serviceid, Date, User_id
Some data looks like this:
20 2-Jan-18 40709217
20 2-Jan-18 40709217
40 2-Jan-18 40709217
40 2-Jan-20 40709217
50 2-Jan-21 40709217
984 22-Mar-18 18246539
269 22-Mar-18 18246539
666 1-Apr-18 18246539
My query request is:
For each 'user_id', get these information:
- First 2 earliest ServiceId and Date that user purchased
- The lastest ServiceId and Date that user purchased
- Count of services that user purchased
Result table's column must follow this order: User_id, FirstServiceid, SecondServiceid, FirstServiceDate, SecondServiceDate, LastServiceid, LastServiceDate, TotalService.
Expected output:
User_id | FirstServiceid | SecondServiceid | FirstDate | SecondDate | LastServiceid | LastDate | TotalServices |
---|---|---|---|---|---|---|---|
40709217 | 20 | 40 | 2-Jan-18 | 2-Jan-20 | 50 | 2-Jan-21 | 5 |
18246539 | 984 | 666 | 22-Mar-18 | 1-Apr-18 | 666 | 1-Apr-18 | 3 |
My idea was to do aggregations, then join them together but I ran into this error
"Column 'purchase.Serviceid' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause"
when calling the lastest Serviceid and Date:
select user_id, max(date), serviceid
from purchases
group by user_id
How to overcome this and is there a better way to aggregate lots of information without having to use JOIN?
CodePudding user response:
You may try using the following which uses window functions such as row_number
and dense_rank
to identify desired records before aggregating your results by user_id
and using a case expression within the MAX
function to extract the data in the desired columns
SELECT
User_id,
MAX(CASE WHEN earliest_order=1 and rp=1 THEN Serviceid END) as FirstServiceid,
MAX(CASE WHEN earliest_order=2 and rp=1 THEN Serviceid END) as SecondServiceid,
MAX(CASE WHEN earliest_order=1 and rp=1 THEN [Date] END) as FirstServiceDate,
MAX(CASE WHEN earliest_order=2 and rp=1 THEN [Date] END) as SecondServiceDate,
MAX(CASE WHEN latest_order=1 THEN Serviceid END) as LastServiceid,
MAX(CASE WHEN latest_order=1 THEN [Date] END) as LastServiceDate,
COUNT(1) as TotalService
FROM (
SELECT
*,
DENSE_RANK() OVER (
PARTITION BY User_id
ORDER BY [Date]
) as earliest_order,
ROW_NUMBER() OVER (
PARTITION BY User_id
ORDER BY [Date] DESC, Serviceid
) as latest_order,
ROW_NUMBER() OVER (
PARTITION BY User_id,[Date]
ORDER BY Serviceid DESC
) rp
FROM
purchases
) t
GROUP BY
User_id
ORDER BY
User_id DESC
The results of the subquery used above have been included below also for your perusal.
SELECT
*,
DENSE_RANK() OVER (
PARTITION BY User_id
ORDER BY [Date]
) as earliest_order,
ROW_NUMBER() OVER (
PARTITION BY User_id
ORDER BY [Date] DESC, Serviceid
) as latest_order,
ROW_NUMBER() OVER (
PARTITION BY User_id,[Date]
ORDER BY Serviceid DESC
) rank_priority
FROM
purchases
ORDER BY User_id, [Date], Serviceid DESC
Serviceid | Date | User_id | earliest_order | latest_order | rank_priority |
---|---|---|---|---|---|
984 | 2018-03-22 | 18246539 | 1 | 3 | 1 |
269 | 2018-03-22 | 18246539 | 1 | 2 | 2 |
666 | 2018-04-01 | 18246539 | 2 | 1 | 1 |
40 | 2018-01-02 | 40709217 | 1 | 5 | 1 |
20 | 2018-01-02 | 40709217 | 1 | 3 | 2 |
20 | 2018-01-02 | 40709217 | 1 | 4 | 3 |
40 | 2020-01-02 | 40709217 | 2 | 2 | 1 |
50 | 2021-01-02 | 40709217 | 3 | 1 | 1 |