SQL:
SELECT 'true' = true;
I know the result is true, but I want to know how Spark compares the two:
- cast true as 'true' and then just compare two string type.
- cast 'true' as true and then compare two boolean type.
Which way does Spark use?
The same happens when comparing date and string or timestamp and string:
SELECT '2022-01-02' > date('2022-01-03');
SELECT '2022-08-12 01:00:00.123456' < timestamp('2019-08-12 01:00:00.123456');
I'm a novice at Spark. Thank you in advance!
CodePudding user response:
The easiest way to learn is to ask Spark to explain the plan.
scala> spark.sql("SELECT 'true' = true").explain
== Physical Plan ==
*(1) Project [true AS (CAST(true AS BOOLEAN) = true)#53]
- *(1) Scan OneRowRelation[]
scala> spark.sql("SELECT '2022-01-02' > date('2022-01-03')").explain
== Physical Plan ==
*(1) Project [false AS (CAST(2022-01-02 AS DATE) > CAST(2022-01-03 AS DATE))#55]
- *(1) Scan OneRowRelation[]
scala> spark.sql("SELECT '2022-08-12 01:00:00.123456' < timestamp('2019-08-12 01:00:00.123456')").explain
== Physical Plan ==
*(1) Project [false AS (CAST(2022-08-12 01:00:00.123456 AS TIMESTAMP) < CAST(2019-08-12 01:00:00.123456 AS TIMESTAMP))#57]
- *(1) Scan OneRowRelation[]
We can clearly see that in all cases Spark attempts to cast string to the other type (boolean, date, timestamp, int, ...).