I am have two data frames with max timestamp value in each.
val Table1max=spark.read.format("parquet").option("header","true").load(s"${SourcePath}/ab12")
Table1max.createOrReplaceTempView("temp")
val table2max=spark.read.format("parquet").option("header","true").load(s"${SourcePath}/abc")
table2max.createOrReplaceTempView("temp1")
Then select max update date from both
val table1maxvalue = spark.sql(s"select max(UPDATE_DATE) from temp")
val table2maxvalue= spark.sql(s"select max(UPDATE_DATE) from temp1")
Here table1maxvalue and table2maxvalue are dataframes.
table1maxvalue
--------------------
| max(UPDATE_DATE)|
--------------------
|2022-05-02 01:04:...|
--------------------
table2maxvalue
--------------------
| max(UPDATE_DATE)|
--------------------
|2022-05-02 01:04:...|
--------------------
Now how can I check if table1maxvalue > table2maxvalue it should something. Like
if(table1maxvalue<table2maxvalue){
Do something
}
As it is data frame i am getting this error: value >= is not a member of org.apache.spark.sql.DataFrame
Pls suggest.
CodePudding user response:
You are trying to compare a dataFrame to another data Frame. You actually need to reference the first row, and then retrieve the value from that row.
In this case you can use the following:
table1maxvalue //Data frame
.head() //get the first row
.getDate(0) //get the first column as a date.