Home > Enterprise >  how to compare two data frames in spark scala?
how to compare two data frames in spark scala?

Time:05-02

I am have two data frames with max timestamp value in each.

val Table1max=spark.read.format("parquet").option("header","true").load(s"${SourcePath}/ab12")
Table1max.createOrReplaceTempView("temp") 

val table2max=spark.read.format("parquet").option("header","true").load(s"${SourcePath}/abc")
table2max.createOrReplaceTempView("temp1")

Then select max update date from both

val table1maxvalue = spark.sql(s"select max(UPDATE_DATE) from temp")
val table2maxvalue= spark.sql(s"select max(UPDATE_DATE) from temp1")

Here table1maxvalue and table2maxvalue are dataframes.

table1maxvalue
 -------------------- 
|    max(UPDATE_DATE)|
 -------------------- 
|2022-05-02 01:04:...|
 -------------------- 

table2maxvalue

 -------------------- 
|    max(UPDATE_DATE)|
 -------------------- 
|2022-05-02 01:04:...|
 -------------------- 

Now how can I check if table1maxvalue > table2maxvalue it should something. Like

if(table1maxvalue<table2maxvalue){
Do something
}

As it is data frame i am getting this error: value >= is not a member of org.apache.spark.sql.DataFrame

Pls suggest.

CodePudding user response:

You are trying to compare a dataFrame to another data Frame. You actually need to reference the first row, and then retrieve the value from that row.

In this case you can use the following:

table1maxvalue //Data frame
.head()        //get the first row
.getDate(0)    //get the first column as a date.
  • Related