Home > Net >  Exclude certain S3 folders while creating a view on Hive
Exclude certain S3 folders while creating a view on Hive

Time:11-30

I am trying to create a view to fetch data from a bucket by excluding certain folders inside S3 on Hive. I was able to successfully create view on Athena, but couldn't do the same on Hive.

Athena View:

CREATE VIEW test
as
SELECT *
FROM TABLE_A
WHERE NOT ("$PATH LIKE '%PASSENGER_DATA%')
AND NOT ("$PATH LIKE '%CUSTOMER_DATA%');

Could you please advise how the same could be achieved on Hive?

CodePudding user response:

There isn't the same facility to filter by path. However, depending on what version you are using you could use Ranger to exclude the data so it wasn't shown.

If you must do it by view try using:

CREATE TABLE filter_out [blah blah blah]
LOCATION '%CUSTOMER_DATA%'

SELECT *
FROM TABLE_A
WHERE NOT EXISTS (SELECT ID FROM filter_out WHERE TABLE_A.ID = filter_out.ID)

CodePudding user response:

You may actually wish to consider moving the data into its own folders. Then you could build them as a table with:

CREATE VIEW TABLE_DATA
as
SELECT *
FROM TABLE_A --("$PATH LIKE '%PASSENGER_DATA%')
UNION
SELECT *
FROM TABLE_B -- ("$PATH LIKE '%CUSTOMER_DATA%');

THis likely will also make your permission issues easier to manage.

And when needed you could easily use one table or both tables.

  • Related