in a spark session
val spark = SparkSession
.builder()
.appName("Spark SQL basic example")
.config("spark.some.config.option", "some-value")
.getOrCreate()
from the dataset
case class Coords(x: Option[Double],y: Option[Double])
val coords = spark.read.format("delta").load("<...>").select(col("x"), col("y")).as[Coords]
how to remove those rows, where either "x" or "y" is null, and where "y" is below 10?
Many Thanks!
CodePudding user response:
val res = coords
.filter(col("x").isNotNull)
.filter(col("y").isNotNull)
.filter(col("y") >= 10)