Home > Software design >  Comparing NULL with a timestamp giving different result in data bricks with spark sql
Comparing NULL with a timestamp giving different result in data bricks with spark sql

Time:08-07

I am having two tables and doing few comparison on few columns. where creatdttm from table1 is a timestamp and all columns in table2 are string. I am comparing creatdttm with effdt and expdt. Here expdt column is having few null values. In join I am doing tb2.exp_dt > tb1.creat_dttm, when both this values are dates it is comparing fine but when tb2.exp_dt is null and tb1.creat_dttm is a date the condition is getting false and no records are selecting. But in sometime it is getting true and I was expecting it to be true. What is the case when null can be greater than a values. Below is the query I am using.

select tb1.mbrpid as casid,
      case
        when tb2.meck is not null
          and tb2.effdt is not null
          and tb2.effdt <= tb1.creatdttm
          and (
             tb2.expdt is null
             or tb2.expdt > tb1.creatdttm
          ) then tb2.meck
        else tb1.facetmbrid
      end as mbrfacetid
      from mbr_cov_interim tb1
      left outer join ecods_mbr_hist tb2 on 
        (tb1.mbr_id = tb2.mbr_id
        and tb2.eff_dt <= tb1.creat_dttm
        and tb2.exp_dt > tb1.creat_dttm
      )

CodePudding user response:

NULL values are very tricky. In all SQL implementations I know (which doesn't include Apache-Spark), nearly all comparisons to null will return false.

All these tests will be false:

NULL = NULL
NULL <> NULL
NULL > 'X'
NULL < 'X'
NULL <> 'X'

The only one that works as you'd expect is:

NULL is null --(true)
NULL is not null --(false)

More details should be available in your database's documentation.

CodePudding user response:

All columns in table2 are string.

Why are you doing this? Why don't you use appropriate data types? As is, you compare a datetime column with a string. The DBMS must convert one of the two values to make the comparison. To be able to to this, it must apply rules, and these rules are depending on session settings.

Is '01/02/20' the first of February or the second of January? Is '20/10/21' October 20, 2021 or October 21, 2020? Don't leave the conversion to chance. Make it explicit, because you know the date format in the string, while the DBMS does not and has to guess. And it may guess wrong, which would explain why you are not seeing the results you expect.

An example for explicit conversion:

where to_date(tb2.eff_dt, 'yyyy-MM-dd') <= tb1.creat_dttm

See https://docs.databricks.com/sql/language-manual/functions/to_date.html

  • Related