Home > Enterprise >  AWS Glue Exclude Patterns
AWS Glue Exclude Patterns

Time:03-03

I am working on a project which is using Glue 3.0 & PySpark to process large amounts of data between S3 buckets. This is being achieved using GlueContext.create_dynamic_frame_from_options to read the data from an S3 bucket to a DynamicFrame using the recurse connection option set to True as the data is nested heavily. I only wish to read files which end in meta.json therefore I have set the exclusions filter to exclude any files which end in data.csv "exclusions": ['**.{txt, csv}', '**/*.data.csv', '**.data.csv', '*.data.csv'] however I am consistently getting the following error:

An error occurred while calling o90.pyWriteDynamicFrame. Unable to parse file: <filename>.data.csv

Is it possible to log the full S3 uri to the output logs or keep a track of the files which have/have not been processed? What is the reason it is still trying to parse this file even though it is included in the exclusions?

CodePudding user response:

Exclusions has to be a string

"exclusions": "[\"**/*.txt\", \"**/*.csv\"]",
  • Related