Home > database >  Delete duplicate rows of a table based on the value of the previous row
Delete duplicate rows of a table based on the value of the previous row

Time:12-31

Let's say I have a SQL Table as such:

serial(pk) name status location time_updated
1 'joe' 'at_home' 'USA' 01:30
2 'jane' 'at_store' 'USA' 02:30
3 'joe' 'driving' 'USA' 12:15
4 'joe' 'driving' 'USA' 13:30
5 'joe' 'at_store' 'USA' 15:00
5 'joe' 'at_store' 'USA' 15:15
6 'joe' 'driving' 'USA' 16:00
7 'joe' 'driving' 'USA' 17:10
8 'joe' 'at_home' 'USA' 20:00

In this table there can be three different status': "at_home", "at_store" and "driving".

I want a chronological order of movements by each person. For instance, for joe, this would look like at_home -> driving -> driving -> at_store -> at_store -> driving -> driving -> at_home

However, I want to remove any duplicate 'driving' status' and only keep the earliest. For instance, for joe, this would look like at_home -> driving -> at_store -> at_store -> driving -> at_home. I do not want to remove duplicate 'at_home' or 'at_store'

In this example, I want to keep the 'driving' status from 12:15 and keep the 'driving' status at 16:00 while removing the duplicate ones that follow.

I want to do this for each person specifically so when I do an 'order by time_updated', I can see all entries for that person in order.

If I query the resulting table using "select * from db where name = 'joe' order by time_updated", my ideal result would be:

serial(pk) name status location time_updated
1 'joe' 'at_home' 'USA' 01:30
3 'joe' 'driving' 'USA' 12:15
5 'joe' 'at_store' 'USA' 15:00
5 'joe' 'at_store' 'USA' 15:15
6 'joe' 'driving' 'USA' 16:00
8 'joe' 'at_home' 'USA' 20:00

Is there a way this can be done in postgres?

Thank you

CodePudding user response:

You could use a delete with exists logic:

DELETE
FROM yourTable t1
WHERE status = 'driving' AND
      NOT EXISTS (SELECT 1 FROM yourTable t2
                  WHERE t2.name = t1.name AND
                        t2.time_updated < t1.time_updated);

CodePudding user response:

First you get your results and order them by dates in reverse order to get the last row, you might even limit this search to a few of them or just the last one.

SELECT * FROM table ORDER BY time_updated DESC LIMIT 1;

Then if the status is 'driving'

UPDATE table SET time_updated = currenttime WHERE id = currentID

You might be able to do this automatically with triggers and procedures directly on postgres but that's beyond this question

  • Related