Home > Software design >  Databricks SQL error: cannot evaluate expression outer in inline table definition
Databricks SQL error: cannot evaluate expression outer in inline table definition

Time:06-25

I joined two tables and tried to get a max timestamp value of two timestamp fields. This is my SQL statement:

SELECT 
   rts.DataLakeModified_DateTime, 
   rtt.DataLakeModified_DateTime,
   (
     SELECT MAX(DataLakeModifiedDateTime)
     FROM VALUES(rts.DataLakeModified_DateTime), (rtt.DataLakeModified_DateTime) AS AllDates (DataLakeModifiedDateTime)
   ) AS DataLakeModifiedDateTime
FROM RetailTransactionSalesTrans_Full rts
LEFT JOIN RetailTransactionTable_Full rtt
    ON rtt.TRANSACTIONID = rts.TRANSACTIONID
        AND rtt.CHANNEL = rts.CHANNEL
        AND rtt.STORE = rts.STORE
        AND rtt.TERMINAL = rts.TERMINALID

But it has an error:

Error in SQL statement: AnalysisException: cannot evaluate expression outer(rts.DataLakeModified_DateTime) in inline table definition; line 6 pos 17

I'm using Spark3.2.1 and MSSQL database. And I'm running it on Azure Databricks notebook using 104 LTS cluster (include Apache Spark3.2.1, Scala 2.12).

Do you have any solution to solve this problem?

CodePudding user response:

The error seems to be because of the usage of the following SQL statement to get the maximum timestamp value.

SELECT MAX(DataLakeModifiedDateTime) 
     FROM VALUES(rts.DataLakeModified_DateTime), (rtt.DataLakeModified_DateTime) AS AllDates (DataLakeModifiedDateTime) 
  • I tried with 2 sample tables tb1 and tb2 which have timestamp fields t1 and t2.The following is how the tables look.

enter image description here

  • When I used to get maximum time stamp values using the same syntax that you used, it gave the same error.

enter image description here

  • Instead, when I tried it using the CASE statement, it gave the desired output.

enter image description here

So, modify your SQL query as shown below:

SELECT  
   rts.DataLakeModified_DateTime,  
   rtt.DataLakeModified_DateTime, 
   ( 
     CASE WHEN rts.DataLakeModified_DateTime > rtt.DataLakeModified_DateTime THEN rts.DataLakeModified_DateTime ELSE rtt.DataLakeModified_DateTime END 
) AS DataLakeModifiedDateTime 
FROM RetailTransactionSalesTrans_Full rts 
LEFT JOIN RetailTransactionTable_Full rtt 
    ON rtt.TRANSACTIONID = rts.TRANSACTIONID 
        AND rtt.CHANNEL = rts.CHANNEL 
        AND rtt.STORE = rts.STORE 
        AND rtt.TERMINAL = rts.TERMINALID 

CodePudding user response:

I found another solution with GREATEST function, simply.

GREATEST(rts.DataLakeModified_DateTime, rtt.DataLakeModified_DateTime) AS DataLakeModifiedDateTime 

It looks like it's simply than WHEN ... THEN ... clause.

Finally, the SQL statement is :

SELECT  
   rts.DataLakeModified_DateTime,  
   rtt.DataLakeModified_DateTime, 
   GREATEST(rts.DataLakeModified_DateTime, rtt.DataLakeModified_DateTime) AS DataLakeModifiedDateTime 
FROM RetailTransactionSalesTrans_Full rts 
LEFT JOIN RetailTransactionTable_Full rtt 
    ON rtt.TRANSACTIONID = rts.TRANSACTIONID 
        AND rtt.CHANNEL = rts.CHANNEL 
        AND rtt.STORE = rts.STORE 
        AND rtt.TERMINAL = rts.TERMINALID 

It worked correctly. Thank you!

  • Related