Home > OS >  Filtering a spark dataset
Filtering a spark dataset

Time:07-21

in a spark session

val spark = SparkSession
  .builder()
  .appName("Spark SQL basic example")
  .config("spark.some.config.option", "some-value")
  .getOrCreate()

from the dataset

case class Coords(x: Option[Double],y: Option[Double])
val coords = spark.read.format("delta").load("<...>").select(col("x"), col("y")).as[Coords]

how to remove those rows, where either "x" or "y" is null, and where "y" is below 10?

Many Thanks!

CodePudding user response:

val res = coords
  .filter(col("x").isNotNull)
  .filter(col("y").isNotNull)
  .filter(col("y") >= 10)
  • Related