Home > Enterprise >  Multiple table vs partitioned table
Multiple table vs partitioned table

Time:07-26

I am new in hive and at my current work I saw an approach where multiple tables are created based on date. E.g TableA_20220701, TableA_20220702 and so on …

I am wondering if this is any better than just creating one table that is partitioned by date.

CodePudding user response:

preferable - one table with partition on date column

One table with partition on date_column -
Pros -

  1. Easily manageable object with partitions - they are actually like separate table and put into separate folder. You can drop/create partitions easily.
  2. Your application/tool doesnt need to know exact partition name. So, for apps its just one object.
  3. You can query on whole dataset.

Cons - You need to have an extra partition_col in the end of the table.

Multiple tables -
Pros - You dont need extra column.
Cons -

  1. All pro from option 1.
  2. you will end up creating 100s of tables after 1 year of run. Very difficult to manage.
  3. if you want to query all the data, you have to union them all together into one query which will become tedious over time.
  • Related