Home > Software design >  Athena Dont Query Non-Parquet data
Athena Dont Query Non-Parquet data

Time:11-04

Data in s3 bucket contain parquet files as well as files in other formats like xml,crc,json etc.. I would like to query only parquet data.

CREATE EXTERNAL TABLE `test`()
ROW FORMAT SERDE 
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS PARQUET
LOCATION
  's3:/some location/'
TBLPROPERTIES (
 'classification'='parquet', 
 'created_by'='system', 
 'has_encrypted_data'='true')

below mentioned query giving me error

SELECT * FROM "test" limit 10;

Error Text: HIVE_BAD_DATA: Not valid Parquet file: s3://some location/control_file.ctl expected magic number: PAR1 got: c8

CodePudding user response:

This is not possible.

Amazon Athena will attempt to read every file in the given directory, including its subdirectories.

CodePudding user response:

If there is any pattern that you can use to recognize parquet files, try to limit to read the files as: select * from test where regex_like("$path", '.parquet')

PS: In the above query I assumed parquet files have .parquet in their file names. I did not test it.

  • Related