I joined two tables and tried to get a max timestamp value of two timestamp fields. This is my SQL statement:
SELECT
rts.DataLakeModified_DateTime,
rtt.DataLakeModified_DateTime,
(
SELECT MAX(DataLakeModifiedDateTime)
FROM VALUES(rts.DataLakeModified_DateTime), (rtt.DataLakeModified_DateTime) AS AllDates (DataLakeModifiedDateTime)
) AS DataLakeModifiedDateTime
FROM RetailTransactionSalesTrans_Full rts
LEFT JOIN RetailTransactionTable_Full rtt
ON rtt.TRANSACTIONID = rts.TRANSACTIONID
AND rtt.CHANNEL = rts.CHANNEL
AND rtt.STORE = rts.STORE
AND rtt.TERMINAL = rts.TERMINALID
But it has an error:
Error in SQL statement: AnalysisException: cannot evaluate expression outer(rts.DataLakeModified_DateTime) in inline table definition; line 6 pos 17
I'm using Spark3.2.1 and MSSQL database. And I'm running it on Azure Databricks notebook using 104 LTS cluster (include Apache Spark3.2.1, Scala 2.12).
Do you have any solution to solve this problem?
CodePudding user response:
The error seems to be because of the usage of the following SQL statement to get the maximum timestamp value.
SELECT MAX(DataLakeModifiedDateTime)
FROM VALUES(rts.DataLakeModified_DateTime), (rtt.DataLakeModified_DateTime) AS AllDates (DataLakeModifiedDateTime)
- I tried with 2 sample tables
tb1
andtb2
which have timestamp fieldst1
andt2
.The following is how the tables look.
- When I used to get maximum time stamp values using the same syntax that you used, it gave the same error.
- Instead, when I tried it using the
CASE
statement, it gave the desired output.
So, modify your SQL query as shown below:
SELECT
rts.DataLakeModified_DateTime,
rtt.DataLakeModified_DateTime,
(
CASE WHEN rts.DataLakeModified_DateTime > rtt.DataLakeModified_DateTime THEN rts.DataLakeModified_DateTime ELSE rtt.DataLakeModified_DateTime END
) AS DataLakeModifiedDateTime
FROM RetailTransactionSalesTrans_Full rts
LEFT JOIN RetailTransactionTable_Full rtt
ON rtt.TRANSACTIONID = rts.TRANSACTIONID
AND rtt.CHANNEL = rts.CHANNEL
AND rtt.STORE = rts.STORE
AND rtt.TERMINAL = rts.TERMINALID
CodePudding user response:
I found another solution with GREATEST
function, simply.
GREATEST(rts.DataLakeModified_DateTime, rtt.DataLakeModified_DateTime) AS DataLakeModifiedDateTime
It looks like it's simply than WHEN ... THEN ... clause.
Finally, the SQL statement is :
SELECT
rts.DataLakeModified_DateTime,
rtt.DataLakeModified_DateTime,
GREATEST(rts.DataLakeModified_DateTime, rtt.DataLakeModified_DateTime) AS DataLakeModifiedDateTime
FROM RetailTransactionSalesTrans_Full rts
LEFT JOIN RetailTransactionTable_Full rtt
ON rtt.TRANSACTIONID = rts.TRANSACTIONID
AND rtt.CHANNEL = rts.CHANNEL
AND rtt.STORE = rts.STORE
AND rtt.TERMINAL = rts.TERMINALID
It worked correctly. Thank you!