Can someone help me explain this behavior:
scala> val l1 = List(84.99F, 9.99F).toDF("dec")
l1: org.apache.spark.sql.DataFrame = [dec: float]
scala> val l2 = List(84.99, 9.99).toDF("dec")
l2: org.apache.spark.sql.DataFrame = [dec: double]
scala> l1.show
-----
| dec|
-----
|84.99|
| 9.99|
-----
scala> l2.show
-----
| dec|
-----
|84.99|
| 9.99|
-----
scala> l1.exceptAll(l2).show(false)
-----------------
|dec |
-----------------
|9.989999771118164|
|84.98999786376953|
-----------------
l1.select('dec.cast("double")).exceptAll(l2).show(false)
-----------------
|dec |
-----------------
|9.989999771118164|
|84.98999786376953|
-----------------
I do understand it's due to the float vs double column comparison in exceptAll, but how and where is the weird diff coming from?
CodePudding user response:
exceptAll
requires Spark to widen (cast) the type of l1
to double
as well. And such a cast is not necessarily precise causing the result you are seeing:
List(84.99F, 9.99F).toDF("dec")
.select('dec.cast("double"))
.show()
-----------------
| dec|
-----------------
|84.98999786376953|
|9.989999771118164|
-----------------