Home > other >  Hive large partition table associated small table optimization problem
Hive large partition table associated small table optimization problem

Time:10-27

I have a large amount of table data more than 20 $: tb_a carried out in accordance with the date date_dt dynamic partitioning, a small table tb_dt used to store the number of need from the big table date
Such as small tables to save the three records said need three days from the big table data, namely three partitions data
The 2019-01-01
The 2019-01-05
The 2019-01-10
The select/* + MAPJOIN (t1) */
*
The from tb_a t0
Left semi join tb_b t1
On t0. Date_dt=t1. Date_dt
This large table will be a full table scan
What method can be optimized

CodePudding user response:

Can select * from tb_a t0 the where t0. Data_dt in (select data_dt from tb_b); If you still want a tb_b table of all fields, just do it again the join
  • Related