Home > Back-end >  How to make SQL Spark Case Insensitive with field values
How to make SQL Spark Case Insensitive with field values

Time:10-04

How can I write SQL Spark Commands to return fields with Case Insensitive results?

Example: Sample_DF below

 -------- 
|  name  |
 -------- 
|  Johnny|
|  Robert|
|  ROBERT|
|  robert|
 -------- 

It seems by Default it seems Spark SQL is case sensitive via the field you query for:

spark.sql("select name from Sample_DF where status like '%Robert%'").show
 ------ 
|name  |
 ------ 
|Robert|
 ------ 

What can I do to configure above query to be case insensitive so that it can return below, assuming there is a large list of various roberts of different lower/uppercase variations?

 -------- 
|  name  |
 -------- 
|  Robert|
|  ROBERT|
|  robert|
 -------- 

As I understand SQL Spark does not support MSSQL Collate

CodePudding user response:

you can make all characters lowercaser.

spark.sql("select status from Sample_DF where lower(status) like '%' || lower('Robert') || '%'").show

there is also a builtin function How to change case of whole column to lowercase?

CodePudding user response:

If you want to take a look at all the names in the name column, you could use the lower function, which converts all chars to lowercase.

Sample_DF.select(F.lower('name')).show()
  • Related