I want to check if a delta table in an s3 bucket is actually a delta table. I am trying do this by
from delta import *
from delta.tables import DeltaTable
from pyspark.sql import SparkSession
from pyspark.sql.functions import *
spark = SparkSession.builder\
.appName('test')\
.getOrCreate()
if DeltaTable.isDeltaTable(spark, "s3a://landing-zone/table_name/year=2022/month=2/part-0000-xyz.snappy.parquet"):
print("bla")
else:
print("blabla")
This code runs forever without returning any result. I tested it with a local delta table and there it works. When I trim the path url so it stops after the actual table name, the code shows the same behavior. I also generated a boto3 client and I can see the bucket list when calling s3.list_bucket(). Do I need to parse the client somehow into the if statement?
Thanks a lot in advance!
CodePudding user response:
I am an idiot, I forgot that it is not enough to just create a boto3 client, but I also have to make the actual connection to S3 via
spark._jsc.hadoopConfiguration().set(...)