Home > Software design >  How to pass date values from dataframe to query in Spark /Scala
How to pass date values from dataframe to query in Spark /Scala

Time:10-27

I am reading the data from Store table which is in snowflake. I want to pass the date from dataframe maxdatefromtbl to my query in spark sql to filter records. This condition (s"CREATED_DATE!='$maxdatefromtbl'") is not working as expected

var retail = spark.read.format("snowflake").options(options).option("query","Select MAX(CREATED_DATE) as CREATED_DATE  from RSTORE").load()
val maxdatefromtbl =  retail.select("CREATED_DATE").toString

var retailnew = spark.read.format("snowflake").options(options).option("query","Select *  from RSTORE").load()

var finaldataresult = retailnew.filter(s"CREATED_DATE!='$maxdatefromtbl'")

CodePudding user response:

Select a single value from the retail dataframe to use in the filter.

val maxdatefromtbl = retail.select("CREATED_DATE").collect().head.getString(0)
var finaldataresult = retailnew.filter(col("CREATED_DATE") =!= maxdatefromtbl)

The retail.select("CREATED_DATE") type is DataFrame, and DataFrame.toString returns the schema rather than the value of the single row you have. Please see the following example from a Spark shell.

scala> val s = Seq(1, 2, 3).toDF()
scala> s.select("value").toString
res0: String = [value: int]

In first line in the code snipped above, collect() wraps the dataframe, with a single row in your case, in an array; head takes the first element of the array, and .getString(0) gets the value from the cell with at the index 0 as String. Please see the DataFrame and Row documentation pages for more information.

  • Related