Tearing my hair out over this one and not 100% sure it's possible, but everything is possible in SQL so it must be?!
The aim is to be able to historically track how many active accounts there were at the end of each month.
I have two tables:
star_UserAccounts
- Holds account information every hour.
- If the data is the same, nothing happens.
- If the data changes, the existing row is flagged as old and the new row is inserted. This enables a nice little history to see what changes are being made.
LastDayOfMonth
- Holds dates that I'd like the data for (just as an example)
Within the "star_UserAccounts" table I have (Ascending):
UserId | UserStatus | InsertedDate | ValidToDate | Row Version |
---|---|---|---|---|
JoeBloggs | Active | 2019-07-19 13:43:09.083 | 2019-10-31 16:08:27.633 | 1 |
JoeBloggs | Active | 2019-10-31 16:08:28.027 | 2020-01-09 10:08:27.840 | 2 |
JoeBloggs | Active | 2020-01-09 10:08:28.013 | 2020-01-09 11:08:28.813 | 3 |
JoeBloggs | Active | 2020-01-09 11:08:28.970 | 2020-01-16 11:08:24.547 | 4 |
Now - I would like to return the details that were valid on the 31st October 2019. There were two updates on that day and therefore the LATEST RowVersion for that day should be returned.
I can get the individual row version for the account when hardcoding the date into my select script, however when I then try to apply this across multiple different dates in one go, it is picking up both rows for the 31st October 2019, but I only want it to fetch the latest row on the days needed.
LastDayOfMonth | ActiveAtDate |
---|---|
2019-07-31 | 1 |
2019-08-31 | 1 |
2019-09-30 | 1 |
2019-10-31 | 2 <-- The problem row |
2019-11-30 | 1 |
2019-12-31 | 1 |
How an earth can I make sure the latest row is pulled for each date that is being pushed in?
Here is a SQL Fiddle to play with: http://sqlfiddle.com/#!18/ed372/12
I have kept in the syntax error which is within the subquery, as this sort of shows what I am trying to achieve as I need to pass the date into the sub query that collects the MAX record for the date.
Appreciate any assistance. Such a hard one to explain in an easy way, apologies if it's very confusing.
Thanks.
CodePudding user response:
You are over-complicating it. You can use simple row-numbering to do this
SELECT
LastDayOfMonth,
COUNT(*) ActiveAtDate
FROM (
SELECT
ld.LastDayOfMonth,
ua.UserId,
rn = ROW_NUMBER() OVER (PARTITION BY ld.LastDayOfMonth, ua.UserId ORDER BY ua.ValidToDate DESC)
FROM dbo.star_UserAccounts ua
INNER JOIN LastDayOfMonth ld
ON ld.LastDayOfMonth >= ua.InsertedDate
AND DATEADD(day, 1, ld.LastDayOfMonth) < ua.ValidToDate
WHERE ua.UserId = 'JoeBloggs'
) latestRowForDate
WHERE rn = 1
GROUP BY LastDayOfMonth;
CodePudding user response:
UPDATE
After re-reading your post I realized I had not fully understood your question prior to answering. I will leave my original answer as it may prove to be helpful.
Your situation may be as simple as using COUNT DISTINCT
, like so:
SELECT
LastDayOfMonth,
COUNT ( DISTINCT UserId ) AS ActiveAtDate
FROM @star_UserAccounts AS ua
CROSS APPLY (
SELECT LastDayOfMonth FROM @LastDayOfMonth AS ld
WHERE ld.LastDayOfMonth BETWEEN CAST( ua.InsertedDate AS date ) AND CAST( ua.ValidToDate AS date )
) AS x
GROUP BY
LastDayOfMonth
ORDER BY
LastDayOfMonth;
RETURNS
---------------- --------------
| LastDayOfMonth | ActiveAtDate |
---------------- --------------
| 2019-07-31 | 1 |
| 2019-08-31 | 1 |
| 2019-09-30 | 1 |
| 2019-10-31 | 1 |
| 2019-11-30 | 1 |
| 2019-12-31 | 1 |
---------------- --------------
Using CROSS APPLY
still has the same effect in limiting the resultset to users with activity during the period in the LastDayOfMonth table.
ORIGINAL ANSWER BELOW
Try something like the following:
DECLARE @star_UserAccounts table (
[UserId] nvarchar (20) COLLATE Latin1_General_CI_AS NOT NULL,
[InsertedDate] datetime NULL,
[ValidToDate] datetime NULL,
[RowVersion] int NOT NULL
);
INSERT INTO @star_UserAccounts ( [UserId], [InsertedDate], [ValidToDate], [RowVersion] )
VALUES
('JoeBloggs', '2020-01-09 11:08:28.970', '2020-01-16 11:08:24.547', 4 ),
('JoeBloggs', '2020-01-09 10:08:28.013', '2020-01-09 11:08:28.813', 3 ),
('JoeBloggs', '2019-10-31 16:08:28.027', '2020-01-09 10:08:27.840', 2 ),
('JoeBloggs', '2019-07-19 13:43:09.083', '2019-10-31 16:08:27.633', 1 );
DECLARE @LastDayOFMonth table (
LastDayOfMonth date
);
INSERT INTO @LastDayOFMonth ([LastDayOfMonth])
VALUES
('2019-07-31 00:00:00' ),
('2019-08-31 00:00:00' ),
('2019-09-30 00:00:00' ),
('2019-10-31 00:00:00' ),
('2019-11-30 00:00:00' ),
('2019-12-31 00:00:00' );
;WITH current_activity AS (
SELECT
UserID, MAX( RowVersion ) AS CurrentVersion
FROM @star_UserAccounts
GROUP BY
UserID
)
SELECT
ua.*
FROM current_activity
INNER JOIN @star_UserAccounts AS ua
ON current_activity.UserID = ua.UserID
AND current_activity.CurrentVersion = ua.[RowVersion]
WHERE EXISTS (
SELECT * FROM @star_UserAccounts AS u
CROSS APPLY (
SELECT * FROM @LastDayOfMonth AS ld
WHERE CAST( ld.LastDayOfMonth AS date ) BETWEEN CAST( u.InsertedDate AS date ) AND CAST( u.ValidToDate AS date )
) AS d
WHERE
u.UserId = current_activity.UserId
)
ORDER BY
current_activity.UserId;
RETURNS
----------- ------------------------- ------------------------- ------------
| UserId | InsertedDate | ValidToDate | RowVersion |
----------- ------------------------- ------------------------- ------------
| JoeBloggs | 2020-01-09 11:08:28.970 | 2020-01-16 11:08:24.547 | 4 |
----------- ------------------------- ------------------------- ------------
The CTE gets the most current UserId's RowVersion and then looks to see if the user had any activity during the dates listed in LastDayOfMonth.
Using CROSS APPLY
restricts the results to users with any activity that includes the LastDayOfMonth.