Home > OS >  Glue not able to recognize Delta Lake Python Library
Glue not able to recognize Delta Lake Python Library

Time:01-27

I am trying to use Delta Lake Python Library in my Glue job. However, my Glue job is not able to recognize it and I get the error "NameError: name 'DeltaTable' is not defined". Per Glue-DeltaLake documentation , I added the paramter --datalake-formats = delta and also updated the required spark configuration

.config("spark.sql.extensions","io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog","org.apache.spark.sql.delta.catalog.DeltaCatalog")

My code fails at below line

deltaTable = DeltaTable.forPath(self.spark,self.dest_path_sdad)

Any ideas?

CodePudding user response:

These configuration properties configure Glue with the Delta Lake file format, so you can write spark.read.format("delta").load(...) or df.write.format("delta").save(...). But they doesn't provide the Python API that is available as the delta-spark package. It could be made available to Glue by using the --additional-python-modules option (doc).

CodePudding user response:

I was missing the import statement

from delta.tables import *
  • Related