I created a table in Athena without a crawler from S3 source. It is showing up in my datacatalog. However, when I try to access it through a python job in Glue ETL, it shows that it has no column or any data. The following error pops up when accessing a column: AttributeError: 'DataFrame' object has no attribute '<COLUMN-NAME>'
.
I am trying to access the dynamic frame following the glue way:
datasource = glueContext.create_dynamic_frame.from_catalog(
database="datacatalog_database",
table_name="table_name",
transformation_ctx="datasource"
)
print(f"Count: {datasource.count()}")
print(f"Schema: {datasource.schema()}")
The above logs output: Count: 0
& Schema: StructType([], {})
, where the Athena table shows I have around ~800,000 rows.
Sidenotes:
- The ETL job concerned has
AWSGlueServiceRole
attached. - I tried Glue Visual Editor as well, it showed the datacatalog database/table concerned but sadly, same error.
CodePudding user response:
It looks like the S3 bucket has multiple nested folders inside it. For Glue to read these folders you need to add a flag adding additional_options = {"recurse": True}
to your from_catalog(). This will help to recursively read records from s3 files.