I am trying to use Delta Lake Python Library in my Glue job. However, my Glue job is not able to recognize it and I get the error "NameError: name 'DeltaTable' is not defined". Per Glue-DeltaLake documentation , I added the paramter --datalake-formats = delta and also updated the required spark configuration
.config("spark.sql.extensions","io.delta.sql.DeltaSparkSessionExtension")
.config("spark.sql.catalog.spark_catalog","org.apache.spark.sql.delta.catalog.DeltaCatalog")
My code fails at below line
deltaTable = DeltaTable.forPath(self.spark,self.dest_path_sdad)
Any ideas?
CodePudding user response:
These configuration properties configure Glue with the Delta Lake file format, so you can write spark.read.format("delta").load(...)
or df.write.format("delta").save(...)
. But they doesn't provide the Python API that is available as the delta-spark package. It could be made available to Glue by using the --additional-python-modules
option (doc).
CodePudding user response:
I was missing the import statement
from delta.tables import *