Home > Back-end >  How read the parquet file located on s3 using Trino?
How read the parquet file located on s3 using Trino?

Time:07-04

I`m beginer in the S3 and Trino(Presto). I have: installed on-premises Trino(Presto) using Docker. S3 bucket where located files with parquet file exension. Im install Apache-Drill and in the box he has s3 conector. And in the Apache-Drill a an use

SELECT * FROM `s3a`.`root`.`./files.parquet`

to select all from file parquet. I thy to do this in Trino(Presto) but i not have result(have error). How i can do same in the Trino(Presto)?

P.s.: When i create the external table with option external_location and format parquet - i can select from it. But do select from exist parquet - i cannot.

CodePudding user response:

I`m beginer in the Presto(Trino) and Apache Drill. In the situation with Apache Dril and S3 - you can generate file "parquet" and upload to the S3. Then you config the connector(catalog) in the Drill and do something like

SELECT * FROM `s3a`.`root`.`./filename.parquet` 

This do the select from exists file parquet. Apache Drill do scan metadata automaticaly. In the example the Trino(Presto) - you config catalog and create schema in the catalog. But do the SELECT from file you cannot. You mast create folder on the s3 and upload parquet files in created dir. In the Presto you must create table with correct metadata and set external_location and format PARQUET pointed to the dir. After then you can do the select from table which indicate to the dir where located parquet files BUT you can only select columns which you created when created table. For example: We have files named userdata.parquet and s3 root access to the backet named "testbucket". The file userdata.parquet has 2 columns (name varchar(20),password varchar(50)) You create on S3 folder usertable and upload in to this folder file userdata.parquet. In presto cmd you must create the table :

create table users (name varchar(20), password varchar(50)) with (external_location='s3a://testbucket/usertable',format='PARQUET');

Now you cat do in the Presto CLI:

select * from users;

Or you can create 2 tables:

create table users1 (name varchar(20)) with (external_location='s3a://testbucket/usertable',format='PARQUET');

create table users2 (password varchar(50)) with (external_location='s3a://testbucket/usertable',format='PARQUET');

And after:

Select * from users1;

select * from users2;
  • Related