Home > Back-end >  How Spark compares data of type boolean and string
How Spark compares data of type boolean and string

Time:12-15

SQL:

SELECT 'true' = true;

I know the result is true, but I want to know how Spark compares the two:

  • cast true as 'true' and then just compare two string type.
  • cast 'true' as true and then compare two boolean type.

Which way does Spark use?

The same happens when comparing date and string or timestamp and string:

SELECT '2022-01-02' > date('2022-01-03');
SELECT '2022-08-12 01:00:00.123456' < timestamp('2019-08-12 01:00:00.123456');

I'm a novice at Spark. Thank you in advance!

CodePudding user response:

The easiest way to learn is to ask Spark to explain the plan.

scala> spark.sql("SELECT 'true' = true").explain
== Physical Plan ==
*(1) Project [true AS (CAST(true AS BOOLEAN) = true)#53]
 - *(1) Scan OneRowRelation[]

scala> spark.sql("SELECT '2022-01-02' > date('2022-01-03')").explain
== Physical Plan ==
*(1) Project [false AS (CAST(2022-01-02 AS DATE) > CAST(2022-01-03 AS DATE))#55]
 - *(1) Scan OneRowRelation[]

scala> spark.sql("SELECT '2022-08-12 01:00:00.123456' < timestamp('2019-08-12 01:00:00.123456')").explain
== Physical Plan ==
*(1) Project [false AS (CAST(2022-08-12 01:00:00.123456 AS TIMESTAMP) < CAST(2019-08-12 01:00:00.123456 AS TIMESTAMP))#57]
 - *(1) Scan OneRowRelation[]

We can clearly see that in all cases Spark attempts to cast string to the other type (boolean, date, timestamp, int, ...).

  • Related