In this sample dataset below, I need to know the time when a staff member sells their second distinct product to their second distinct client.
Staff Member | SalesID | Product | Client | TimeStamp |
---|---|---|---|---|
Tom | 0E5840B53148 | Ford | Hertz | 07:03:36 |
Tom | FD363DD1EFC5 | Ford | Hertz | 07:12:04 |
Tom | 0E5840B53148 | Citroen | Avis | 07:30:00 |
Tom | 29F33DDF6BBB | Renault | Alamo | 07:46:19 |
So in this scenario, for staff member Tom, that criteria would be fulfilled at 07:30 when Tom sells the second distinct product to the second distinct client.
Staff Member | TimeStamp |
---|---|
Tom | 07:30:00 |
I've been able to group by staff members and count how many products they sold and to the number of clients they've sold to but what i need is confirmation of exactly when this criteria is met. If anyone can help, I'd be really appreciate it.
CodePudding user response:
It sounds like you need to use a ROW_NUMBER partition.
Using the format of
ROW_NUMBER ()
OVER ( [ PARTITION BY StaffMember, Client ] ORDER BY TimeStamp )
AS [ROWNUMBER]
You can generate an incremental counter for every distinct combination of StaffMember, and Client. From there you can use both a CTE and WHERE clause to identify all rows where [ROWNUMBER] = 2
In it's current state this does not aggregate by day. The new [ROWNUMBER] counter would tick ever upwards with no upper boundary. Instead we can use a type cast to generate a limitation on Dates in the ROW_NUMBER partition.
ROW_NUMBER ()
OVER ( [ PARTITION BY StaffMember, Client, CAST(TimeStamp AS DATE) ] ORDER BY TimeStamp )
AS [ROWNUMBER]
This should allow you to query your table like so:
SELECT StaffMember, Client TimeStamp,
ROW_NUMBER ()
OVER ( PARTITION BY StaffMember, Client, CAST(TimeStamp AS DATE) ORDER BY TimeStamp )
AS [ROWNUMBER]
FROM YourTable
You'd need to use a CTE to actually query the data as you cannot directly use a windowed function like ROW_NUMBER in a WHERE clause. Which would would look something like this:
WITH CTE AS(
SELECT SalesID,
ROW_NUMBER ()
OVER ( PARTITION BY StaffMember, Client, CAST(TimeStamp AS DATE) ORDER BY TimeStamp )
AS [ROWNUMBER]
FROM YourTable
)
SELECT StaffMember, Client, TimeStamp
FROM YourTable
JOIN CTE ON
YourTable.SalesID = CTE.SalesID
WHERE CTE.ROWNUMBER = 2
CodePudding user response:
The idea behind this, is that you first group the data, so that identical products have only one row (with the lowest timestamp) in case it is the second. Then use ROW_NUMBER too get the second, for every staff member.
CREATE TABLE tab1 ([Staff Member] varchar(3), [SalesID] varchar(12), [Product] varchar(7), [Client] varchar(5), [TimeStamp] varchar(8)) ; INSERT INTO tab1 ([Staff Member], [SalesID], [Product], [Client], [TimeStamp]) VALUES ('Tom', '0E5840B53148', 'Ford', 'Hertz', '07:03:36'), ('Tom', 'FD363DD1EFC5', 'Ford', 'Hertz', '07:12:04'), ('Tom', '0E5840B53148', 'Citroen', 'Avis', '07:30:00'), ('Tom', '29F33DDF6BBB', 'Renault', 'Alamo', '07:46:19') ;
WITH CTE as (SELECT [Staff Member],[Product], [Client], MIN([TimeStamp]) as [TimeStamp] , ROW_NUMBER() OVER( PARTITION BY [Staff Member] ORDER BY MAX([TimeStamp])) rn FROM tab1 GROUP BY [Staff Member],[Product], [Client]) SELECT [Staff Member],[TimeStamp] FROM CTE WHERE rn = 2
Staff Member | TimeStamp :----------- | :-------- Tom | 07:30:00
db<>fiddle here