SQL Shift Timeseries (Get Following Row For Multiple Timestamps)-CodePudding

I have a database which looks like this:

timestamp           | entity_id
-------------------- ----------
2021-12-01 10:00:00 | A
2021-12-01 09:00:00 | A
2021-12-01 08:00:01 | A
2021-12-01 08:00:00 | B
2021-12-01 07:00:00 | A

timestamp is UNIQUE, but I don't know in advance how far different timestamps are apart. How can I write a statement that would get me the following result?

entity_id | following_entity_id | count
---------- --------------------- ------
A         | A                   | 2
A         | B                   | 1
B         | A                   | 1

When using pandas I'd probably use its shift method, but I need to do this using raw SQL in this case.

CodePudding user response：

You need LAG() window function to get the value of the previous (chronologically) entity_id (or LEAD() window function to get the following) and then aggregate:

SELECT entity_id, following_entity_id, COUNT(*) count
FROM (
  SELECT *, LAG(entity_id) OVER (ORDER BY timestamp) following_entity_id
  FROM tablename       
)
WHERE following_entity_id IS NOT NULL
GROUP BY entity_id, following_entity_id;

See the demo.