I'm trying to update two columns in an archaic Oracle database, but the query simply doesn't finish and nothing is updated. Any ideas to improve the query or something else that can be done? I don't have DBA skills/knowledge and unsure if indexing would help, so would appreciate comments in that area, too.
PERSON table: This table has 200 million distinct person_id's. There are no duplicates. The person_id is numeric and am trying to update the favorite_color and color_confidence columns, which are varchar2 and values currently NULLed out.
person table
person_id favorite_color color_confidence many_other_columns
222
333
444
TEMP_COLOR_CONFIDENCE table: I'm trying to get the favorite_color and color_confidence from this table and update to the PERSON table. This table has 150 million distinct person's, again nothing duplicated.
temp_color_confidence
person_id favorite_color color_confidence
222 R H
333 Y L
444 G M
This is my update query, which I realize only updates those found in both tables. Eventually I'll need to update the remaining 50 million with "U" -- unknown. Solving that in one shot would be ideal too, but currently just concerned that I'm not able to get this query to complete.
UPDATE person p
SET (favorite_color, color_confidence) =
(SELECT t.favorite_color, t.color_confidence
FROM temp_color_confidence t
WHERE p.person_id = t.person_id)
WHERE EXISTS (
SELECT 1
FROM temp_color_confidence t
WHERE p.person_id = t.person_id );
Here's where my ignorance shines... would indexing on person_id help, considering they are all distinct anyway? Would indexing on favorite_color help? There are less than 10 colors and only 3 confidence values.
CodePudding user response:
For every person, it has to find the corresponding row in temp_color_confidence. The way to do that with the least I/O is to scan each table once and crunch them together in a single hash join, ideally all in memory. Indexes are unlikely to help with that, unless maybe temp_color_confidence is very wide and verbose and has an index on (person_id, favorite_color, color_confidence) which the optimiser can treat as a skinny table.
Using merge
might be more efficient as it can avoid the second scan of temp_color_confidence:
merge into person p
using temp_color_confidence t
on (p.person_id = t.person_id)
when matched then update
set p.favorite_color = t.favorite_color, p.color_confidence = t.color_confidence;
If you are going to update every row in the table, though, you might consider instead creating a new table containing all the values you need:
create table person2
( person_id, favorite_color, color_confidence )
pctfree 0 compress
as
select p.person_id, nvl(t.favorite_color,'U'), nvl(t.color_confidence,0)
from person p
left join temp_color_confidence t
on t.person_id = p.person_id;