Using Scala here:
Val df = spark.read.format("jdbc").
option("url", "<host url>").
option("dbtable", "UPPERCASE_SCHEMA.table_name").
option("user", "postgres").
option("password", "<password>").
option("numPartitions", 50).
option("fetchsize", 20).
load()
The database I'm using the above code to call from has many schemas and they are all in uppercase letters (UPPERCASE_SCHEMA).
No matter how I try to denote that the schema is in all caps, Spark converts it to lowercase which fails to initialize with the actual DB.
I've tried making it a variable and explicitly denoting it is all uppercase, etc. in multiple languages, but no luck.
Would anyone know a workaround?
When I went into the actual DB (Postgres) and temporarily changed the schema to all lowercase, it worked absolutely fine.
CodePudding user response:
Try to set spark.sql.caseSensitive
to true
(false
by default)
spark.conf.set('spark.sql.caseSensitive', true)
You can see in the source code its definition: https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala#L833
In addition, you can see in the JDBCWriteSuite
how it affects the JDBC connector:
https://github.com/apache/spark/blob/ee95ec35b4f711fada4b62bc27281252850bb475/sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCWriteSuite.scala