Home > Back-end >  How to optimize the insert query from multiple tables?
How to optimize the insert query from multiple tables?

Time:10-12

I have 2 tables, Table 1 (temp table in SP) has around 400 records. Table 2 has around 30,550,284 records.

I need to run a loop on table 1 for each record and get the top 1 from table 2 based on a few conditions (where clause) and then order by modified date in decreasing order.

There is an index on the modified date.

declare @iPos int;
declare @iCount int;
select @iCount = count(*) from Table1;
set @iPos = 1;

declare @Table2 table(......)
declare @timestampLocal2 datetime
while (@iPos <= @iCount)
BEGIN
   select @val1 = Col1, @timestampLocal = TimeStamp  
   from @Table1 where ID = @iPos

   set @timestampLocal2 = DATEADD(HH,-96,@timestampLocal) 
   INSERT INTO @Temp3 ( .... ),....)
    select   top 1 r.LastModified, r.[Col2], r.Col3, @iPos
    from Table2  (NOLOCK) r 
    where  Col1 =@val1 and  
    r.LastModified <= @timestampLocal
    and r.LastModified >=  @timestampLocal2
    and (r.Col2 is not null and r.Col3 is not null) 
    order by LastModified  desc

    SELECT @iPos = @iPos   1;
END

This query is very slow.

I have also thought to archive table 2, But I want to keep that as the second option for now.

Do I really need to add an index on the columns which are involved in the where clause?

So my question is, is there a better performant way to do this?

CodePudding user response:

I am not sure about the logic and relationship of the tables as you have not mentioned it, but the query you have written can be simplified without using a while loop like the following in a single insert statement which will be definitely faster compared to a while loop.

 declare @Table2 table(......)

   INSERT INTO @Temp3 ( .... ),....)
    select top 1 r.LastModified, r.[Col2], r.Col3,  t1.ID
    from Table2  (NOLOCK) r 
    inner join @Table1 t1 on r.Col1 = t1.Col1
    where r.LastModified <= t1.[TimeStamp]
    and r.LastModified >=  DATEADD(HH,-96,t1.[TimeStamp])
    and (r.Col2 is not null and r.Col3 is not null) 
    order by LastModified  desc

CodePudding user response:

I believe a CROSS APPLY or OUTER APPLY may do the trick. These can be thought of as being similar to INNER JOIN or LEFT JOIN, except that they allow you to reference a subquery having more complex conditions such as TOP 1 and ORDER BY. Ideal for cases like this.

-- INSERT INTO @Temp3 ( .... )
select r.LastModified, r.[Col2], r.Col3, t1.ID
from @Table1 t1
cross apply (
    SELECT TOP 1 r.*
    from Table2 r -- Don't use (NOLOCK)
    where r.Col1 = t.Col1
    and r.LastModified <= t1.[TimeStamp]
    and r.LastModified >= DATEADD(HH,-96,t1.[TimeStamp])
    and (r.Col2 is not null and r.Col3 is not null) 
    order by r.LastModified desc
) r

For efficiency, I recommend an index on Table2(Col1,LastModified) or as an absolute minimum, an index on Table2(Col1).

I would strongly discourage the use of (NOLOCK) or 'READ UNCOMMITTED` in queries that update the database (like the insert into table3 above). While the query may appear to work most of the time, seemingly random occurrences of missing or duplicate rows may result.

Do you need to handle cases where no matching Table2 record is found? The above will quietly ignore such cases. Changing the CROSS APPLY to an OUTER APPLY together with logic to handle null r.xxx values could be what you need.

  • Related