I am reading the data from Store table which is in snowflake. I want to pass the date from dataframe maxdatefromtbl to my query in spark sql to filter records. This condition (s"CREATED_DATE!='$maxdatefromtbl'") is not working as expected
var retail = spark.read.format("snowflake").options(options).option("query","Select MAX(CREATED_DATE) as CREATED_DATE from RSTORE").load()
val maxdatefromtbl = retail.select("CREATED_DATE").toString
var retailnew = spark.read.format("snowflake").options(options).option("query","Select * from RSTORE").load()
var finaldataresult = retailnew.filter(s"CREATED_DATE!='$maxdatefromtbl'")
CodePudding user response:
Select a single value from the retail
dataframe to use in the filter.
val maxdatefromtbl = retail.select("CREATED_DATE").collect().head.getString(0)
var finaldataresult = retailnew.filter(col("CREATED_DATE") =!= maxdatefromtbl)
The retail.select("CREATED_DATE")
type is DataFrame, and DataFrame.toString
returns the schema rather than the value of the single row you have. Please see the following example from a Spark shell.
scala> val s = Seq(1, 2, 3).toDF()
scala> s.select("value").toString
res0: String = [value: int]
In first line in the code snipped above, collect()
wraps the dataframe, with a single row in your case, in an array; head
takes the first element of the array, and .getString(0)
gets the value from the cell with at the index 0
as String
. Please see the DataFrame and Row documentation pages for more information.