nested query with two tables take more than 20 minutes-CodePudding

Here is our query which took 1229.206 seconds to execute that SQL (returning 8310286 rows):

SELECT t_01.object_uid 
FROM HashTable t_01 
WHERE t_01.object_uid IN (SELECT t_02.puid  
                          FROM ObjectTable t_02 
                          WHERE (t_02.arev_category IN (48, 40)))

Plan hash value: 1560846306

------------------------------------------------------------------------------------------------
| Id  | Operation                            | Name              | E-Rows |E-Bytes| Cost (%CPU)|
------------------------------------------------------------------------------------------------
|   0 | SELECT STATEMENT                     |                   |        |       |   780K(100)|
|   1 |  NESTED LOOPS SEMI                   |                   |   7764K|   244M|   780K  (1)|
|   2 |   INDEX FULL SCAN                    | PIHashTable       |   7764K|   111M|  4073   (1)|
|*  3 |   TABLE ACCESS BY INDEX ROWID BATCHED| ObjectTable       |    290M|  4986M|     1   (0)|
|*  4 |    INDEX RANGE SCAN                  | PIOBJECTTABLE     |      1 |       |     1   (0)|
------------------------------------------------------------------------------------------------

table HashTable has 51154 blocks, last analyzed 2022/04/19 with 7764715 rows
    index PIHashTable on HashTable (OBJECT_UID)  last analyzed 2022/04/19 over 7764715 rows
table ObjectTable has 3327616 blocks, last analyzed 2022/05/02 with 290473386 rows
    index PIPPOM_OBJECT on ObjectTable (PUID)  last analyzed 2022/05/02 over 290473386 rows

Table ObjectTable has 290 million rows and Hashtable has 7 million rows.

Any way to optimize this?

CodePudding user response：

It's most likely going to take a while to return 8M rows. You should implement paging instead of returning all of the rows at once. It looks like you're using Oracle so try this:

select *
from (
    SELECT t_01.object_uid, row_number() over(order by t_01.object_uid) r
    FROM HashTable t_01 
    WHERE t_01.object_uid IN (
        SELECT t_02.puid  FROM ObjectTable t_02 WHERE (t_02.arev_category IN (48,40))
    )
)
where r between 1 and 1000;

Then you can get rows 1 - 1000 then 1001 - 2000 and so on. This has the added benefit of using less memory and CPU on the server per query but also returning data sooner that you (or the user) can act on while additional data is loaded in the background

CodePudding user response：

If you implement this as a join and put an index on t_02.arev_category it will be very quick.

SELECT t_01.object_uid 
FROM HashTable t_01 
JOIN ObjectTable t_02 ON t_02.arev_category IN (48, 40) and t_02.puid = t_01.object_uid

If the object table can contain the same puid for both categories 40 and 48 then do it like this:

SELECT t_01.object_uid 
FROM HashTable t_01 
JOIN (SELECT DISTINCT puid
      FROM ObjectTable
      WHERE arev_category IN (48, 40)
) t_02 ON t_02.puid = t_01.object_uid