Home > Software design >  How to create a external table when date is in between s3 path
How to create a external table when date is in between s3 path

Time:09-27

I have s3 bucket as below,

s3://rawzone/toyoto/2020-11-04/

inside each date I have 4 files

 s3://rawzone/toyoto/2020-11-04/Company.json
 s3://rawzone/toyoto/2020-11-04/sales.json
 s3://rawzone/toyoto/2020-11-04/transport.json
 s3://rawzone/toyoto/2020-11-04/preaquisitions.json

and I have so many dates like this

 s3://rawzone/toyoto/2020-11-05/Company.json
 s3://rawzone/toyoto/2020-11-05/sales.json
 s3://rawzone/toyoto/2020-11-05/transport.json
 s3://rawzone/toyoto/2020-11-05/preaquisitions.json

I need to create a table for across all dates for Company,sales,transport,preaquisitions

How can I retrieve this data from s3 bucket for each table?

So far tried (kept ??? where I don't know how to replace it)

 create or replace external table toyoto.PB_compnay
    (
       columns mapping......
    )
    partition by ???
    with location = @raw_zone/pitchbook/??????
    file_format = json_format
    aws_sns_topic='arn:aws:sns:us-west-1:5438:dev-autore'
    auto_refresh = true 

CodePudding user response:

The practice is to partition the data files based on increments of time; or, if the data files are staged from multiple sources, to partition by a data source identifier and date or timestamp.

If you prefer to have 4 tables based on the filenames, I would recommend having a pattern that would pick up all the files that have that name.

create or replace external table toyoto.PB_compnay
 (
   columns mapping......
 )
 with location = @stage/path/
 file_format = json_format
 pattern='.*/.*/Company[.]json'
 aws_sns_topic='arn:aws:sns:us-xxx:xxx:xxxx'
 auto_refresh = true
  • Related