I have a database which looks like this:
timestamp | entity_id
-------------------- ----------
2021-12-01 10:00:00 | A
2021-12-01 09:00:00 | A
2021-12-01 08:00:01 | A
2021-12-01 08:00:00 | B
2021-12-01 07:00:00 | A
timestamp
is UNIQUE
, but I don't know in advance how far different timestamps are apart. How can I write a statement that would get me the following result?
entity_id | following_entity_id | count
---------- --------------------- ------
A | A | 2
A | B | 1
B | A | 1
When using pandas I'd probably use its shift
method, but I need to do this using raw SQL in this case.
CodePudding user response:
You need LAG()
window function to get the value of the previous (chronologically) entity_id
(or LEAD()
window function to get the following) and then aggregate:
SELECT entity_id, following_entity_id, COUNT(*) count
FROM (
SELECT *, LAG(entity_id) OVER (ORDER BY timestamp) following_entity_id
FROM tablename
)
WHERE following_entity_id IS NOT NULL
GROUP BY entity_id, following_entity_id;
See the demo.