I can select data from database in Spark like this:
var df = spark.read.
format("jdbc").
option("url", "jdbc:db://<DB server>:<DB port>/<dbname>").
option("user", "<username>").
option("password", "<password>").
option("dbtable", "<your table>").
load()
But after this how can I close db connection? Is it closed automatically?
CodePudding user response:
Keep in mind, Spark is a distributed system. Each executor will require its own connection(s) to the database (e.g. when doing partitioned reads). There's simply no way how you could close all opened connections manually.
All of this is taken care of automatically by Spark and nothing you have to worry about..
CodePudding user response:
Spark opens and closes the JDBC connections as needed, to extract/validate metadata when building query execution plan, to save dataframe partitions to a database, or to compute dataframe when scan is triggered by a Spark action. See JdbcRelationProvider
,JdbcUtils
, and
JDBCRDD
source for where/how exactly its done.