Home > Enterprise >  Bad performance when joining two sets based on a
Bad performance when joining two sets based on a

Time:10-14

To better illustrate my problem picture the following data set that has Rooms that contain a "range" of animals. To represent the range, each animal is assigned a sequence number in a separate table. There are different animal types and the sequence is "reset" for each of them.

Table A

RoomId StartAnimal EndAnimal GroupType
1 Monkey Bee A
1 Lion Buffalo A
2 Ant Frog B

Table B

Animal Sequence Type
Monkey 1 A
Zebra 2 A
Bee 3 A
Turtle 4 A
Lion 5 A
Buffalo 6 A
Ant 1 B
Frog 2 B

Desired Output

Getting all the animals for each Room based on their Start-End entries, e.g.

RoomId Animal
1 Monkey
1 Zebra
1 Bee
1 Lion
1 Buffalo
2 Ant
2 Frog

I have been able to get the desired output by first creating a view where the rooms have their start and end sequence numbers, and then Join them with the animal list comparing the ranges.

The problem is that this is performing poorly in my real data set where there are around 10k rooms and around 340k animals. Is there a different (better) way to go about this that I'm not seeing?

Example fiddle I'm working with: https://dbfiddle.uk/RnagCTf0

The query I tried is

WITH fullAnimals AS (
    SELECT DISTINCT(RoomId), a.[Animal], ta.[GroupType], a.[sequence] s1, ae.[sequence] s2
    FROM  [TableA] ta
        LEFT JOIN [TableB] a ON a.[Animal] = ta.[StartAnimal] AND a.[Type] = ta.[GroupType]
        LEFT JOIN [TableB] ae ON ae.[Animal] = ta.[EndAnimal] AND ae.[Type] = a.[Type]
)
SELECT DISTINCT(r.Id), Name, b.[Animal], b.[Type]
FROM [TableB] b
    LEFT JOIN fullAnimals ON (b.[Sequence] >= s1 AND b.[Sequence] <= s2)
    INNER JOIN [Rooms] r ON (r.[Id] = fullAnimals.[RoomId]) --this is a third table that has more data from the rooms
WHERE b.[Type] = fullAnimals.[GroupType]

Thanks!

CodePudding user response:

One option, to remove the aggregations, is to use the following joins:

  • between TableA and TableB, to gather "a.StartAnimal" id
  • between TableA and TableB, to gather "a.EndAnimal" id
  • between TableB and the previous two TableBs, to gather only the rows that have b.Sequence between the two values of "a.StartAnimal" id and "b.StartAnimal" id, on the matching "Type".
  • between Table A and Rooms, to gather room infos
SELECT r.*, b.Animal, b.Type
FROM       TableA a
INNER JOIN TableB b1 ON a.StartAnimal = b1.Animal
INNER JOIN TableB b2 ON a.EndAnimal = b2.Animal
INNER JOIN TableB b  ON b.Sequence BETWEEN b1.Sequence AND b2.Sequence
                    AND a.GroupType = b.Type
INNER JOIN Rooms r   ON r.Id = a.roomId 

Check the updated demo here.

  • Related