Home > database >  MySQL Select delete any element except for first one per day
MySQL Select delete any element except for first one per day

Time:04-12

my problem ist the following one:

I have a database which receives reports from a server and saves the data into the report table:

enter image description here

and I want to delete and select every report which are made on the same day, except for the first one.

I've already tried to select the reports, which are on the same day:

    WITH res as (
select 
cis_anlagen.name as plant,
ReportTimestamp,
LAG(ReportTimestamp, 1) OVER (
    partition by cis_anlagen.name
    ORDER BY ReportTimestamp
) prevTime
from reports
inner join hosts_to_apps using (HostToAppId)
join hosts using(HostId)
Left join cis_anlagen on hosts.anId = cis_anlagen.anId )
select 
    plant,
    ReportTimestamp,
    prevTime
from res
where DATEDIFF(ReportTimestamp, prevTime) = 0;

this gives me any report made on the same they, but I still need to exclude the first one.

CodePudding user response:

I want to delete ... every report which are made on the same day, except for the first one

DELETE t1
FROM reports t1
JOIN reports t2 ON t1.HostToAppId = t2.HostToAppId
               AND DATE(t1.ReportTimestamp) = DATE(t2.ReportTimestamp)
WHERE t1.ReportTimestamp > t2.ReportTimestamp

I.e. we delete the row when the row with the same HostToAppId and DATE but greater ReportTimestamp exitst.

If there exists 2 or more rows for the same HostToAppId with absolutely the same (and minimal within this day) ReportTimestamp then all of them will be stored.


I want to ... select every report which are made on the same day, except for the first one.

SELECT t1.*
FROM reports t1
JOIN reports t2 ON t1.HostToAppId = t2.HostToAppId
               AND DATE(t1.ReportTimestamp) = DATE(t2.ReportTimestamp)
WHERE t1.ReportTimestamp > t2.ReportTimestamp

CodePudding user response:

Instead of using LAG, you could use row_number.

If you use row number partitioned by date (not the whole timestamp, but just the day) and the HostToAppID (Which I am assuming that is an unique identifier of some kind) you would have the the reports numbered for HostToAppID and for day, which would allow you to exclude anything where the row number is not 1.

I can't test it right now but I would go with something like this:

WITH res as (
select 
    cis_anlagen.name as plant,
    ReportTimestamp,
    row_number(ReportTimestamp) OVER (
        partition by cis_anlagen.name, date(ReportTimestamp)
        ORDER BY ReportTimestamp
        ) rn
    from reports
    inner join hosts_to_apps using (HostToAppId)
    join hosts using(HostId)
    Left join cis_anlagen on hosts.anId = cis_anlagen.anId )

    select 
plant,
ReportTimestamp,
prevTime from res where rn <> 1;

CodePudding user response:

Why all the joins? You want to deleting everything except the oldest row per day, so delete all rows where exists a row for the same day, but earlier time:

delete from reports
where exists
(
  select null
  from (select * from reports) older
  where date(older.reporttimestamp) = date(reports.reporttimestamp)
  and older.reporttimestamp < reports.reporttimestamp
);

Demo: https://dbfiddle.uk/?rdbms=mysql_8.0&fiddle=c8a941f3813ae60e14e32dadef46b361

You may wonder about (select * from reports) older. This is because of a MySQL peculiarity that forbids to select from the same table directly that you are deleting from. In other DBMS that would sinmply be from reports older.

  • Related