Home > Net >  spark exceptAll weird behavior
spark exceptAll weird behavior

Time:06-30

Can someone help me explain this behavior:

scala> val l1 = List(84.99F, 9.99F).toDF("dec")
l1: org.apache.spark.sql.DataFrame = [dec: float]

scala> val l2 = List(84.99, 9.99).toDF("dec")
l2: org.apache.spark.sql.DataFrame = [dec: double]

scala> l1.show
 ----- 
|  dec|
 ----- 
|84.99|
| 9.99|
 ----- 


scala> l2.show
 ----- 
|  dec|
 ----- 
|84.99|
| 9.99|
 ----- 

scala> l1.exceptAll(l2).show(false)
 -----------------                                                              
|dec              |
 ----------------- 
|9.989999771118164|
|84.98999786376953|
 ----------------- 

l1.select('dec.cast("double")).exceptAll(l2).show(false)
 -----------------                                                              
|dec              |
 ----------------- 
|9.989999771118164|
|84.98999786376953|
 ----------------- 

I do understand it's due to the float vs double column comparison in exceptAll, but how and where is the weird diff coming from?

CodePudding user response:

exceptAll requires Spark to widen (cast) the type of l1 to double as well. And such a cast is not necessarily precise causing the result you are seeing:

List(84.99F, 9.99F).toDF("dec")
  .select('dec.cast("double"))
  .show()

 ----------------- 
|              dec|
 ----------------- 
|84.98999786376953|
|9.989999771118164|
 ----------------- 

  • Related