my problem ist the following one:
I have a database which receives reports from a server and saves the data into the report table:
and I want to delete and select every report which are made on the same day, except for the first one.
I've already tried to select the reports, which are on the same day:
WITH res as (
select
cis_anlagen.name as plant,
ReportTimestamp,
LAG(ReportTimestamp, 1) OVER (
partition by cis_anlagen.name
ORDER BY ReportTimestamp
) prevTime
from reports
inner join hosts_to_apps using (HostToAppId)
join hosts using(HostId)
Left join cis_anlagen on hosts.anId = cis_anlagen.anId )
select
plant,
ReportTimestamp,
prevTime
from res
where DATEDIFF(ReportTimestamp, prevTime) = 0;
this gives me any report made on the same they, but I still need to exclude the first one.
CodePudding user response:
I want to delete ... every report which are made on the same day, except for the first one
DELETE t1
FROM reports t1
JOIN reports t2 ON t1.HostToAppId = t2.HostToAppId
AND DATE(t1.ReportTimestamp) = DATE(t2.ReportTimestamp)
WHERE t1.ReportTimestamp > t2.ReportTimestamp
I.e. we delete the row when the row with the same HostToAppId
and DATE
but greater ReportTimestamp
exitst.
If there exists 2 or more rows for the same HostToAppId
with absolutely the same (and minimal within this day) ReportTimestamp
then all of them will be stored.
I want to ... select every report which are made on the same day, except for the first one.
SELECT t1.*
FROM reports t1
JOIN reports t2 ON t1.HostToAppId = t2.HostToAppId
AND DATE(t1.ReportTimestamp) = DATE(t2.ReportTimestamp)
WHERE t1.ReportTimestamp > t2.ReportTimestamp
CodePudding user response:
Instead of using LAG, you could use row_number.
If you use row number partitioned by date (not the whole timestamp, but just the day) and the HostToAppID (Which I am assuming that is an unique identifier of some kind) you would have the the reports numbered for HostToAppID and for day, which would allow you to exclude anything where the row number is not 1.
I can't test it right now but I would go with something like this:
WITH res as (
select
cis_anlagen.name as plant,
ReportTimestamp,
row_number(ReportTimestamp) OVER (
partition by cis_anlagen.name, date(ReportTimestamp)
ORDER BY ReportTimestamp
) rn
from reports
inner join hosts_to_apps using (HostToAppId)
join hosts using(HostId)
Left join cis_anlagen on hosts.anId = cis_anlagen.anId )
select
plant,
ReportTimestamp,
prevTime from res where rn <> 1;
CodePudding user response:
Why all the joins? You want to deleting everything except the oldest row per day, so delete all rows where exists a row for the same day, but earlier time:
delete from reports
where exists
(
select null
from (select * from reports) older
where date(older.reporttimestamp) = date(reports.reporttimestamp)
and older.reporttimestamp < reports.reporttimestamp
);
Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c8a941f3813ae60e14e32dadef46b361
You may wonder about (select * from reports) older
. This is because of a MySQL peculiarity that forbids to select from the same table directly that you are deleting from. In other DBMS that would sinmply be from reports older
.