Home > OS >  Most fastest way to find rows in one table which are not present in other table
Most fastest way to find rows in one table which are not present in other table

Time:03-30

I have 2 tables , one called "Booking" and another called "Assignments" there is no relation between these two tables

"Booking" tables contains columns "Resource" ,"start date", "end date", "hours"

"Assignments" table contains columns "Resource" , "start date" , "end date", "hours"

"Booking" table rows are daily records , means start and end date will always be same and hours will be less than 8(working hours in a day)

"Assignment" table rows can be multi day records , means start and end date can be same or different , example start date 01-02-2022 and end date 03-02-2022 and hours can be 24 hours(max 8 hours per day)

Now I have to find all booking rows for a "Resource" , which do not have equivalent assignment row for a particular day and hours

There can be complex scenario in which booking for "resource a" for "1-2-2022" is for "8 hours" , but assignment row for same date and resource is just 6 hours , so I need to find those missing hours too.

What is the fastest and most efficient way to do it? Records can be in millions and preferred programming language is C# , if C# cannot give me a way to process all this in Max 30 minutes ,other approaches are also welcome , like storing these 2 tables in a SQL database and run a query on it , to get results in a 3rd table

Thanks for your help

CodePudding user response:

I would use the backend to do such processes. An index on a Checksum on each table would be efficient. Start by creating the Booking and Assignment tables.

Add the checksums:

ALTER TABLE Booking
ADD Booking_Checksum AS CHECKSUM( [Resource] ,[start date], [end date], [hours]) PERSISTED;

ALTER TABLE Assignment
ADD Assignment_Checksum AS CHECKSUM( [Resource] ,[start date], [end date], [hours]) PERSISTED;

Add indexes:

CREATE NONCLUSTERED INDEX [IX_Booking_Checksum] ON [dbo].[Booking]
(   Booking_Checksum ASC )

CREATE NONCLUSTERED INDEX [IX_Assignment_Checksum] ON [dbo].[Assignment]
(   Assignment_Checksum ASC )

All in Booking but not Assignment:

Select b.*
From Booking b
left join Assignment a on b.Booking_Checksum = a.Assignment_Checksum 
where a.Assignment_Checksum is null

All in Assignment but not in Booking:

Select a.*
From Assignment a
left join Booking b on b.Booking_Checksum = a.Assignment_Checksum 
where b.Booking_Checksum is null
  • Related