I am trying to work out the SQL I would need to select certain records, here is an example of what I'm trying to do:
Project number | Active/Inactive | Date |
---|---|---|
1 | A | 1/1/20 |
1 | I | 3/1/20 |
1 | A | 5/1/20 |
1 | I | 7/1/20 |
1 | I | 9/1/20 |
2 | I | 1/1/19 |
2 | A | 5/1/19 |
3 | A | 1/3/20 |
3 | I | 3/3/20 |
3 | I | 5/3/20 |
Note: A=Active project, I=Inactive.
What I would like to do is for each project where the project is currently inactive (i.e. the latest date for the project in the above table is set to I), return the row of the longest time ago it was made inactive, but NOT before it was last active (hope this is understandable!). So for the above table the following would be returned:
Project number | Active/Inactive | Date |
---|---|---|
1 | I | 7/1/20 |
3 | I | 3/3/20 |
So proj number 1 is inactive and the earliest time it was made inactive (after the last time it was active) is 7/1/20. Project 2 is not selected as it is currently active. Project 3 is inactive and the earliest time it was made inactive (after the last time it was active) is 3/3/20.
Thanks.
CodePudding user response:
You could use the 'row_number' function to help you.
create TABLE #PROJECT(ProjectNumber int, [Status] varcha(1), [Date] date)
INSERT INTO #PROJECT VALUES
(1 ,'A' ,'1/1/20'),
(1 ,'I' ,'3/1/20'),
(1 ,'A' ,'5/1/20'),
(1 ,'I' ,'7/1/20'),
(1 ,'I' ,'9/1/20'),
(2 ,'I' ,'1/1/19'),
(2 ,'A' ,'5/1/19'),
(3 ,'A' ,'1/3/20'),
(3 ,'I' ,'3/3/20'),
(3 ,'I' ,'5/3/20')
select * from
(SELECT
row_number() over (partition by projectNumber order by [date]) as [index]
,*
FROM
#PROJECT
WHERE
[STATUS] = 'I'
) as a where [index] = 1
CodePudding user response:
Using some effective date joins, this should work. I am using SQL Server. Create your tables and set up the same data set you provided:
CREATE TABLE dbo.PROJECTS
(
PROJ_NUM int NULL,
STTS char(1) NULL,
STTS_DT date NULL
) ON [PRIMARY]
GO
INSERT INTO dbo.PROJECTS values (1, 'A', '1/1/20');
INSERT INTO dbo.PROJECTS values (1, 'I', '3/1/20');
INSERT INTO dbo.PROJECTS values (1, 'A', '5/1/20');
INSERT INTO dbo.PROJECTS values (1, 'I', '7/1/20');
INSERT INTO dbo.PROJECTS values (1, 'I', '9/1/20');
INSERT INTO dbo.PROJECTS values (2, 'I', '1/1/19');
INSERT INTO dbo.PROJECTS values (2, 'A', '5/1/19');
INSERT INTO dbo.PROJECTS values (3, 'A', '1/3/20');
INSERT INTO dbo.PROJECTS values (3, 'I', '3/3/20');
INSERT INTO dbo.PROJECTS values (3, 'I', '5/3/20');
Write a sub-query that filters out just to the projects that are INACTIVE:
-- sub-query that gives you projects that are inactive
SELECT PROJ_NUM, STTS, STTS_DT FROM dbo.PROJECTS CURRSTTS
WHERE STTS_DT = (SELECT MAX(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = CURRSTTS.PROJ_NUM)
AND CURRSTTS.STTS = 'I'
;
Write another sub-query that provides you the last active status date for each project:
-- sub-query that gives you last active status date for each project
SELECT PROJ_NUM, STTS, STTS_DT FROM dbo.PROJECTS LASTACTV
WHERE STTS_DT = (SELECT MAX(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = LASTACTV.PROJ_NUM AND ALLP.STTS = 'A')
;
Combine those two sub-queries into a query that gives you the list of inactive projects with their last active status date:
-- sub-query using the 2 above to show only inactive projects with last active stts date
SELECT CURRSTTS.PROJ_NUM, CURRSTTS.STTS, CURRSTTS.STTS_DT, LASTACTV.STTS_DT AS LASTACTV_STTS_DT FROM dbo.PROJECTS CURRSTTS
INNER JOIN
(SELECT PROJ_NUM, STTS, STTS_DT FROM dbo.PROJECTS LASTACTV
WHERE STTS_DT = (SELECT MAX(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = LASTACTV.PROJ_NUM AND ALLP.STTS = 'A'))
LASTACTV ON CURRSTTS.PROJ_NUM = LASTACTV.PROJ_NUM
WHERE CURRSTTS.STTS_DT = (SELECT MAX(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = CURRSTTS.PROJ_NUM)
AND CURRSTTS.STTS = 'I'
Add one more layer to the query that selects the MIN(STTS_DT) that is greater than the LASTACTV_STTS_DT:
-- final query that uses above sub-query
SELECT P.PROJ_NUM, P.STTS, P.STTS_DT
FROM dbo.PROJECTS P
INNER JOIN (
SELECT CURRSTTS.PROJ_NUM, CURRSTTS.STTS, CURRSTTS.STTS_DT, LASTACTV.STTS_DT AS LASTACTV_STTS_DT FROM dbo.PROJECTS CURRSTTS
INNER JOIN
(SELECT PROJ_NUM, STTS, STTS_DT FROM dbo.PROJECTS LASTACTV
WHERE STTS_DT = (SELECT MAX(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = LASTACTV.PROJ_NUM AND ALLP.STTS = 'A'))
LASTACTV ON CURRSTTS.PROJ_NUM = LASTACTV.PROJ_NUM
WHERE CURRSTTS.STTS_DT = (SELECT MAX(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = CURRSTTS.PROJ_NUM)
AND CURRSTTS.STTS = 'I'
) SUB ON SUB.PROJ_NUM = P.PROJ_NUM
WHERE P.STTS_DT = (SELECT MIN(STTS_DT) FROM dbo.PROJECTS ALLP WHERE ALLP.PROJ_NUM = P.PROJ_NUM AND ALLP.STTS_DT > SUB.LASTACTV_STTS_DT)
The result I get back matches your desired result:
CodePudding user response:
"Greatest n-per group" is the thing to look up when you run accross a problem like this again. Here is a query that will get what you need in postgresSQL.
I realized I changed your column to a boolean, but you will get the gist.
with most_recent_projects as (
select project_number, max(date) date from testtable group by project_number
),
currently_inactive_projects as (
select t.project_number, t.date from testtable t join most_recent_projects mrp on t.project_number = mrp.project_number and t.date = mrp.date where not t.active
),
last_active_date as (
select project_number, date from (
select t.project_number, rank() OVER (
PARTITION BY t.project_number
ORDER BY t.date DESC), t.date
from currently_inactive_projects cip join testtable t on t.project_number = cip.project_number where t.active) t1 where rank = 1
)
-- oldest inactive -- ie, result
select t.project_number, t.active, min(t.date) from last_active_date lad join testtable t on lad.project_number = t.project_number and t.date > lad.date group by t.project_number, t.active;
CodePudding user response:
This is a variation of "gaps and islands" problem.
The query may be like this
SELECT
num,
status,
MIN(date) AS date
FROM (
SELECT
*,
MAX(group_id) OVER (PARTITION BY num) AS max_group_id
FROM (
SELECT
*,
SUM(CASE WHEN status = prev_status THEN 0 ELSE 1 END) OVER (PARTITION BY num ORDER BY date) AS group_id
FROM (
SELECT
*,
LAG(status) OVER (PARTITION BY num ORDER BY date) AS prev_status
FROM projects
) groups
) islands
) q
WHERE status = 'I' AND group_id = max_group_id
GROUP BY num, status
ORDER BY num
Another approach using CTEs
WITH last_status AS (
SELECT
*
FROM (
SELECT
*,
ROW_NUMBER() OVER (PARTITION BY num ORDER BY date DESC) AS rn
FROM projects
) rns
WHERE rn = 1
),
last_active AS (
SELECT
num,
MAX(date) AS date
FROM projects
WHERE status = 'A'
GROUP BY num
),
last_inactive AS (
SELECT
p.num,
MIN(p.date) AS date
FROM projects p
WHERE p.status = 'I'
AND (
EXISTS (
SELECT 1 FROM last_active la
WHERE la.num = p.num AND la.date < p.date
)
OR NOT EXISTS (
SELECT 1 FROM last_active la
WHERE la.num = p.num
)
)
GROUP BY num
)
SELECT
ls.num,
ls.status,
li.date
FROM last_status ls
JOIN last_inactive li ON li.num = ls.num
WHERE ls.status = 'I'
You can check a working demo with both queries here