I have a table (table_old
) with a column (person_id
) that sometimes has duplicate values. I want to add a column that has a the maximum value of age_at_midmonth
for each value of person_id
. Table_old
has 27,236,296 rows. Table_new
has 27,209,850 rows. The number of rows should not decrease since this is a LEFT JOIN
with no WHERE
clause. What is happening here? Am I doing this wrong? Is there a better way? The query is running in Oracle.
create table schema1.table_new as
select
t1.*,
t2.current_age
from
schema1.table_old t1
left join (
select person_id, max(age_at_midmonth) as current_age
from schema1.table_old
group by person_id
) t2
on t1.person_id = t2.person_id;
CodePudding user response:
The issue here was that I was looking at the number of rows using the Analyze Table feature and then looking at the Stats/Size info for the table using Toad for Oracle. It appears that running Analyze Table does not necessarily produce an accurate count of rows. When I did select count(*) from table_new;
and the same for table_old
, I got the exact same number of rows. Oops.
CodePudding user response:
I know this wasn't your actual question, but you don't actually even need to do a LEFT JOIN
for that example. It would be easier to just use analytical functions instead of the aggregate as part of a subquery. For instance, your example query could be turned into:
SELECT t.*,
MAX(t.age_at_midmonth) OVER (PARTITION BY t.person_id) AS CURRENT_AGE
FROM schema1.table_old t;