I currently have the dataset below:
Group | Start | End |
---|---|---|
A | 2021-01-01 | 2021-04-05 |
A | 2021-01-01 | 2021-06-05 |
A | 2021-03-01 | 2021-06-05 |
B | 2021-06-13 | 2021-08-05 |
B | 2021-06-13 | 2021-09-05 |
B | 2021-07-01 | 2021-09-05 |
C | 2021-10-07 | 2021-10-17 |
C | 2021-10-07 | 2021-11-15 |
C | 2021-11-12 | 2021-11-15 |
I want like the following final dataset: Essentially, I would like to remove all observations that don't equal the minimum start value and I want to do this by group.
Group | Start | End |
---|---|---|
A | 2021-01-01 | 2021-04-05 |
A | 2021-01-01 | 2021-06-05 |
B | 2021-06-13 | 2021-08-05 |
B | 2021-06-13 | 2021-09-05 |
C | 2021-10-07 | 2021-10-17 |
C | 2021-10-07 | 2021-11-15 |
I tried the following code but I cannot do a min statement in a where clause. Any help would be appreciated.
Delete from #df1
where start != min(start)
CodePudding user response:
If you want to remove all rows, that have not the same [start] you can join a subquery which find the earliest day, you can add additional ON
clauses if you need to find other rows as well
DELETE o1 FROM observations o1 INNER JOIN(SELECT MIN([Start]) minstart , [Group] FROM observations GROUP BY [Group] ) o2 ON o1.[Group] = o2.[Group] AND o1.[Start] <> o2.minstart
SELECT * FROM observations
Group | Start | End :---- | :--------- | :--------- A | 2021-01-01 | 2021-04-05 A | 2021-01-01 | 2021-06-05 B | 2021-06-13 | 2021-08-05 B | 2021-06-13 | 2021-09-05 C | 2021-10-07 | 2021-10-17 C | 2021-10-07 | 2021-11-15
db<>fiddle here
CodePudding user response:
Another alternative using a CTE:
with keepers as (
select [Group], min(Start) as mStart
from #df1 group by [Group]
)
delete src
from #df1 as src
where not exists (select * from keepers
where keepers.[Group] = src.[Group] and keepers.mStart = src.Start)
;
You should make an effort to avoid using reserved words as names, since that requires extra effort to write sql using those.
CodePudding user response:
You can try this
DELETE FROM table_name WHERE start IN (SELECT MIN(start) FROM table_name GROUP BY start)