I have a question regarding creating tables in Hadoop. I create external table the following way:
CREATE EXTERNAL HADOOP TABLE SCHEMA.TABLENAME (
ID BIGINT NOT NULL,
CODE INTEGER,
"VALUE" DOUBLE
STORED AS ORC
TBLPROPERTIES ('bigsql.table.io.doAs'='false',
'bucketing_version'='2',
'orc.compress'='ZLIB',
'orc.create.index'='true')
After I created this table I run Jenkins job (with sqoop process) which loads 70.000.000 records to this table. Then I needed to remove this table, so I run:
DROP TABLE SCHEMA.TABLENAME
Later on I want to create a table with the same name as the previous one, but I need it to be empty. I make the same query as earlier, I do:
CREATE EXTERNAL HADOOP TABLE SCHEMA.TABLENAME (
ID BIGINT NOT NULL,
CODE INTEGER,
"VALUE" DOUBLE
STORED AS ORC
TBLPROPERTIES ('bigsql.table.io.doAs'='false',
'bucketing_version'='2',
'orc.compress'='ZLIB',
'orc.create.index'='true')
But when I create table this way, it has 70.000.000 records inside it again, although I didn't run any job to populate it. This is why I have two questions:
- When I drop and create table with old name, then is it recovering records from the old table?
- How can I drop (or truncate) table in bigsql/hive so that I have an empty table with the old name.
I am using bigsql and hive.
CodePudding user response:
Dropping an external table doesn't remove the stored data, only the metadata from the Hive Metastore.
Refer Managed vs External Tables
Key points...
- Use external tables when files are already present or in remote locations
- files should remain even if the table is dropped
Create a managed table (remove EXTERNAL
from your query), if you want to be able to DROP and/or TRUNCATE.
Or have your Jenkins job run hadoop fs -rm -skipTrash
before the import.