Home > OS >  Retrieve multiple information in a group by
Retrieve multiple information in a group by

Time:11-04

I have a table 'purchases' with 3 columns Serviceid, Date, User_id

Some data looks like this:

20  2-Jan-18    40709217
20  2-Jan-18    40709217
40  2-Jan-18    40709217
40  2-Jan-20    40709217
50  2-Jan-21    40709217    
984 22-Mar-18   18246539    
269 22-Mar-18   18246539    
666 1-Apr-18    18246539

My query request is:

For each 'user_id', get these information:

  • First 2 earliest ServiceId and Date that user purchased
  • The lastest ServiceId and Date that user purchased
  • Count of services that user purchased

Result table's column must follow this order: User_id, FirstServiceid, SecondServiceid, FirstServiceDate, SecondServiceDate, LastServiceid, LastServiceDate, TotalService.

Expected output:

User_id FirstServiceid SecondServiceid FirstDate SecondDate LastServiceid LastDate TotalServices
40709217 20 40 2-Jan-18 2-Jan-20 50 2-Jan-21 5
18246539 984 666 22-Mar-18 1-Apr-18 666 1-Apr-18 3

My idea was to do aggregations, then join them together but I ran into this error

"Column 'purchase.Serviceid' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause"

when calling the lastest Serviceid and Date:

select user_id, max(date), serviceid    
from purchases    
group by user_id

How to overcome this and is there a better way to aggregate lots of information without having to use JOIN?

CodePudding user response:

You may try using the following which uses window functions such as row_number and dense_rank to identify desired records before aggregating your results by user_id and using a case expression within the MAX function to extract the data in the desired columns

SELECT
    User_id,
    MAX(CASE WHEN earliest_order=1 and rp=1 THEN Serviceid END) as FirstServiceid,
    MAX(CASE WHEN earliest_order=2 and rp=1 THEN Serviceid END) as SecondServiceid,
    MAX(CASE WHEN earliest_order=1 and rp=1 THEN [Date] END) as FirstServiceDate,
    MAX(CASE WHEN earliest_order=2 and rp=1 THEN [Date] END) as SecondServiceDate,
    MAX(CASE WHEN latest_order=1 THEN Serviceid END) as LastServiceid,
    MAX(CASE WHEN latest_order=1 THEN [Date] END) as LastServiceDate,
    COUNT(1) as TotalService
FROM (
   SELECT 
       *,
       DENSE_RANK() OVER (
           PARTITION BY User_id
           ORDER BY [Date] 
       ) as earliest_order,
       ROW_NUMBER() OVER (
           PARTITION BY User_id
           ORDER BY [Date] DESC, Serviceid
       ) as latest_order,
       ROW_NUMBER() OVER (
              PARTITION BY User_id,[Date]
              ORDER BY Serviceid DESC
       ) rp
   FROM 
       purchases
   
) t
GROUP BY
    User_id
ORDER BY
    User_id DESC

The results of the subquery used above have been included below also for your perusal.

   SELECT
        *,
       DENSE_RANK() OVER (
           PARTITION BY User_id
           ORDER BY [Date] 
       ) as earliest_order,
       ROW_NUMBER() OVER (
           PARTITION BY User_id
           ORDER BY [Date] DESC, Serviceid
       ) as latest_order,
       ROW_NUMBER() OVER (
              PARTITION BY User_id,[Date]
              ORDER BY Serviceid DESC
       ) rank_priority
   FROM
           purchases
    ORDER BY User_id, [Date], Serviceid DESC
Serviceid Date User_id earliest_order latest_order rank_priority
984 2018-03-22 18246539 1 3 1
269 2018-03-22 18246539 1 2 2
666 2018-04-01 18246539 2 1 1
40 2018-01-02 40709217 1 5 1
20 2018-01-02 40709217 1 3 2
20 2018-01-02 40709217 1 4 3
40 2020-01-02 40709217 2 2 1
50 2021-01-02 40709217 3 1 1

View working demo db fiddle here

  •  Tags:  
  • sql
  • Related