Home > Software engineering >  Accessing datacatalog table in Glue properly
Accessing datacatalog table in Glue properly

Time:03-08

I created a table in Athena without a crawler from S3 source. It is showing up in my datacatalog. However, when I try to access it through a python job in Glue ETL, it shows that it has no column or any data. The following error pops up when accessing a column: AttributeError: 'DataFrame' object has no attribute '<COLUMN-NAME>'.

I am trying to access the dynamic frame following the glue way:

datasource = glueContext.create_dynamic_frame.from_catalog(
  database="datacatalog_database",
  table_name="table_name",
  transformation_ctx="datasource"
)

print(f"Count: {datasource.count()}")
print(f"Schema: {datasource.schema()}")

The above logs output: Count: 0 & Schema: StructType([], {}), where the Athena table shows I have around ~800,000 rows.

Sidenotes:

  • The ETL job concerned has AWSGlueServiceRole attached.
  • I tried Glue Visual Editor as well, it showed the datacatalog database/table concerned but sadly, same error.

CodePudding user response:

It looks like the S3 bucket has multiple nested folders inside it. For Glue to read these folders you need to add a flag adding additional_options = {"recurse": True} to your from_catalog(). This will help to recursively read records from s3 files.

  • Related