I am trying to convert a SQL Code into Pyspark SQL. While selecting the columns from a table , the Select Statement has something as below :
Select a.(column1|column2|column3)? .
,trim(column c) from Table a;
I would like to understand what "a.(column1|column2|column3)? .
" expression resolves to and what it actually implies? How to address this while converting the sql into pyspark?
CodePudding user response:
That is a way of selecting certain column names using regexps. That regex matches (and excludes) the columns column1
, column2
or column3
.
It is the Spark's equivalent of the Hive's Quoted Identifiers. See also Spark's documentation.
Be aware that, for enabling this behavior, it is first necessary to run the following command:
spark.sql("SET spark.sql.parser.quotedRegexColumnNames=true").show(false)