I need to build a SQL script to count how many times pairs of different entities appeared in the same DATE and LOCATION. On any given date, there will be multiple locations and many entity IDs. I need to find out how often PAIRS of entities were at the same location on the same date, and count the number of co-occurrences. In reality, I'm going to have many hundreds of distinct entities across 12 months of dates and 20 locations.
Entity | Date | Location |
---|---|---|
A | 1-1-23 | Loc 1 |
B | 1-1-23 | Loc 1 |
C | 1-1-23 | Loc 1 |
D | 1-1-23 | Loc 1 |
E | 1-1-23 | Loc 1 |
F | 1-1-23 | Loc 1 |
A | 1-2-23 | Loc 2 |
B | 1-2-23 | Loc 2 |
D | 1-2-23 | Loc 2 |
C | 1-2-23 | Loc 3 |
F | 1-2-23 | Loc 3 |
B | 1-3-23 | Loc 2 |
A | 1-4-23 | Loc 1 |
F | 1-4-23 | Loc 1 |
A | 1-5-23 | Loc 2 |
C | 1-5-23 | Loc 2 |
D | 1-5-23 | Loc 2 |
E | 1-5-23 | Loc 3 |
I want to count how many times entity A appeared with entity B on the same date and location. The results would look like this (Note - eventually I'll order by Count(desc) but this result lets you see the factorial combinations first):
Entity1 | Entity2 | Count |
---|---|---|
A | B | 2 |
A | C | 2 |
A | D | 3 |
A | E | 1 |
A | F | 2 |
B | C | 1 |
B | D | 2 |
B | E | 1 |
B | F | 1 |
C | D | 2 |
C | E | 1 |
C | F | 2 |
D | E | 1 |
D | F | 1 |
E | F | 1 |
I'm at a bit of a loss on how to do this. My first thought was to:
SELECT t1.Entity as Entity1, t2.Entity as Entity2, COUNT(*) as Count
FROM (
SELECT Entity, CONCAT(Date, Location) AS ConcatenatedValue, COUNT(*)
FROM occurrences
WHERE Year(Date) = 2022) t1,
(SELECT Entity, CONCAT(Date, Location) AS ConcatenatedValue, COUNT(*)
FROM occurrences
WHERE Year(Date) = 2022) t2
WHERE t1.ConcatenatedValue = t2.ConcatenatedValue
GROUP BY Entity1, Entity2
ORDER BY Count
Clearly that doesn't do what I need. HELP! My head is spinning.
CodePudding user response:
You can address this problem with a self join on your table, with the following conditions:
- date must match
- location must match
- 1st table entity is smaller than 2nd table entity
Then you can apply aggregation directly.
SELECT t1.Entity AS entity1,
t2.Entity AS entity2,
COUNT(t1.Date) AS cnt
FROM tab t1
INNER JOIN tab t2
ON t1.Date = t2.Date
AND t1.Location = t2.Location
AND t1.Entity < t2.Entity
GROUP BY entity1, entity2
ORDER BY entity1, entity2
Check the demo here.