first and last methods : scala, spark-CodePudding

In Pyspark, we have : The first() function returns the first element present in the column, when the ignoreNulls is set to True, it returns the first non-null element. The last() function returns the last element present in the column, when ignoreNulls is set to True, it further returns the last non-null element.

I would like to know, if we have equivalent methods for scala spark env.

Thank you in advance.

CodePudding user response：

Yes.

A quick look at the documentation gives you first and last: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html#first(columnName:String):org.apache.spark.sql.Column

def first(columnName: String): Column

Aggregate function: returns the first value of a column in a group.

The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.

CodePudding user response：

Yes, It is available in Scala Spark same as PySpark.

df.select(functions.first("col1",ignoreNulls = true),
          functions.last("col2",ignoreNulls = true))
  .show(false)

CodePudding user response：

Spark is developed using Scala, so any scala methods you want to use in Spark are available.