Home > database >  Count and rank-order co-occurrences of entities across dates and locations in MySQL
Count and rank-order co-occurrences of entities across dates and locations in MySQL

Time:01-06

I need to build a SQL script to count how many times pairs of different entities appeared in the same DATE and LOCATION. On any given date, there will be multiple locations and many entity IDs. I need to find out how often PAIRS of entities were at the same location on the same date, and count the number of co-occurrences. In reality, I'm going to have many hundreds of distinct entities across 12 months of dates and 20 locations.

Entity Date Location
A 1-1-23 Loc 1
B 1-1-23 Loc 1
C 1-1-23 Loc 1
D 1-1-23 Loc 1
E 1-1-23 Loc 1
F 1-1-23 Loc 1
A 1-2-23 Loc 2
B 1-2-23 Loc 2
D 1-2-23 Loc 2
C 1-2-23 Loc 3
F 1-2-23 Loc 3
B 1-3-23 Loc 2
A 1-4-23 Loc 1
F 1-4-23 Loc 1
A 1-5-23 Loc 2
C 1-5-23 Loc 2
D 1-5-23 Loc 2
E 1-5-23 Loc 3

I want to count how many times entity A appeared with entity B on the same date and location. The results would look like this (Note - eventually I'll order by Count(desc) but this result lets you see the factorial combinations first):

Entity1 Entity2 Count
A B 2
A C 2
A D 3
A E 1
A F 2
B C 1
B D 2
B E 1
B F 1
C D 2
C E 1
C F 2
D E 1
D F 1
E F 1

I'm at a bit of a loss on how to do this. My first thought was to:

SELECT t1.Entity as Entity1, t2.Entity as Entity2, COUNT(*) as Count
FROM (
SELECT Entity, CONCAT(Date, Location) AS ConcatenatedValue, COUNT(*) 
FROM occurrences 
WHERE Year(Date) = 2022) t1,
(SELECT Entity, CONCAT(Date, Location) AS ConcatenatedValue, COUNT(*)
FROM occurrences
WHERE Year(Date) = 2022) t2
WHERE t1.ConcatenatedValue = t2.ConcatenatedValue
GROUP BY Entity1, Entity2
ORDER BY Count

Clearly that doesn't do what I need. HELP! My head is spinning.

CodePudding user response:

You can address this problem with a self join on your table, with the following conditions:

  • date must match
  • location must match
  • 1st table entity is smaller than 2nd table entity

Then you can apply aggregation directly.

SELECT t1.Entity      AS entity1,
       t2.Entity      AS entity2,
       COUNT(t1.Date) AS cnt
FROM       tab t1
INNER JOIN tab t2
        ON t1.Date = t2.Date 
       AND t1.Location = t2.Location 
       AND t1.Entity < t2.Entity
GROUP BY entity1, entity2
ORDER BY entity1, entity2

Check the demo here.

  • Related