I am working on Databricks and I use spark to laod and publish data to a SQL database. One of the task I need to do is to get the schema of a table of my database and therefore see the datatypes of each column. The only way I am able to do it so far is by loading the whole table and then extracting the schema.
df_tableA = spark.read.format("jdbc") \
.option("url", datasource_url) \
.option("dbtable", table_name) \
.option("user", dbuser) \
.option("password", dbpassword) \
.option("driver", driver) \
.load()
However my goal is to get just the schema without loading the entire table since I want to speed up the process and I do not want to overload the memory.
Would you be able to suggest a smart and elegant way to achieve my goal?
CodePudding user response:
Normally, load
does not load the table into memory. But if you want, you can use a dummy query and pass to dbtable
like .option("dbtable", "(select * from table where 1 = 2) t")