Home > Software design >  Old records appear in the Hadoop table after drop and creating new table with the same old name
Old records appear in the Hadoop table after drop and creating new table with the same old name

Time:11-08

I have a question regarding creating tables in Hadoop. I create external table the following way:

CREATE EXTERNAL HADOOP TABLE SCHEMA.TABLENAME (
  ID BIGINT NOT NULL,
  CODE INTEGER,
  "VALUE" DOUBLE
STORED AS ORC
TBLPROPERTIES ('bigsql.table.io.doAs'='false',
  'bucketing_version'='2',
  'orc.compress'='ZLIB',
  'orc.create.index'='true')

After I created this table I run Jenkins job (with sqoop process) which loads 70.000.000 records to this table. Then I needed to remove this table, so I run:

DROP TABLE SCHEMA.TABLENAME

Later on I want to create a table with the same name as the previous one, but I need it to be empty. I make the same query as earlier, I do:

CREATE EXTERNAL HADOOP TABLE SCHEMA.TABLENAME (
  ID BIGINT NOT NULL,
  CODE INTEGER,
  "VALUE" DOUBLE
STORED AS ORC
TBLPROPERTIES ('bigsql.table.io.doAs'='false',
  'bucketing_version'='2',
  'orc.compress'='ZLIB',
  'orc.create.index'='true')

But when I create table this way, it has 70.000.000 records inside it again, although I didn't run any job to populate it. This is why I have two questions:

  1. When I drop and create table with old name, then is it recovering records from the old table?
  2. How can I drop (or truncate) table in bigsql/hive so that I have an empty table with the old name.

I am using bigsql and hive.

CodePudding user response:

Dropping an external table doesn't remove the stored data, only the metadata from the Hive Metastore.

Refer Managed vs External Tables

Key points...

  • Use external tables when files are already present or in remote locations
  • files should remain even if the table is dropped

Create a managed table (remove EXTERNAL from your query), if you want to be able to DROP and/or TRUNCATE.

Or have your Jenkins job run hadoop fs -rm -skipTrash before the import.

  • Related