I want to run a repair job (MSCK REPAIR TABLE) in Azure Databricks, excludig 4 tables. What am I doing wrong?
database = "demo"
tables = spark.catalog.listTables(database)
tables = spark.sql("show tables in demo")
tables = tables.filter((tables.tableName != "example1") & (tables.tableName != "example2") & (tables.tableName != "example3") & (tables.tableName != "example4"))
for i in tables.collect():
print(i)
for table in tables:
spark.sql(f"MSCK REPAIR TABLE {database}.{table.name}")`
I´ll get following error message:
AttributeError Traceback (most recent call last)
<command-2033459303290955> in <module>
1 for i in tables:
----> 2 spark.sql(f"MSCK REPAIR TABLE {database}.{table.name}")
AttributeError: 'function' object has no attribute 'name'
CodePudding user response:
There are two errors:
- You don't collect data to the driver node - in this case you just iterate over the list of columns in the
tables
dataframe - You use the wrong name for the column - it's not the
name
, buttableName
.
This should work:
database = "demo"
tables = spark.sql(f"show tables in {database}")
tables = tables.filter((tables.tableName != "example1") & (tables.tableName != "example2") & (tables.tableName != "example3") & (tables.tableName != "example4"))
for table in tables.collect():
spark.sql(f"MSCK REPAIR TABLE {database}.{table.tableName}")`
P.S. Plus you have duplicate code