Data in s3 bucket contain parquet files as well as files in other formats like xml,crc,json etc.. I would like to query only parquet data.
CREATE EXTERNAL TABLE `test`()
ROW FORMAT SERDE
'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe'
STORED AS PARQUET
LOCATION
's3:/some location/'
TBLPROPERTIES (
'classification'='parquet',
'created_by'='system',
'has_encrypted_data'='true')
below mentioned query giving me error
SELECT * FROM "test" limit 10;
Error Text: HIVE_BAD_DATA: Not valid Parquet file: s3://some location/control_file.ctl expected magic number: PAR1 got: c8
CodePudding user response:
This is not possible.
Amazon Athena will attempt to read every file in the given directory, including its subdirectories.
CodePudding user response:
If there is any pattern that you can use to recognize parquet files, try to limit to read the files as: select * from test where regex_like("$path", '.parquet')
PS: In the above query I assumed parquet files have .parquet in their file names. I did not test it.