In Pyspark, we have : The first() function returns the first element present in the column, when the ignoreNulls is set to True, it returns the first non-null element. The last() function returns the last element present in the column, when ignoreNulls is set to True, it further returns the last non-null element.
I would like to know, if we have equivalent methods for scala spark env.
Thank you in advance.
CodePudding user response:
Yes.
A quick look at the documentation gives you first
and last
: https://spark.apache.org/docs/latest/api/scala/org/apache/spark/sql/functions$.html#first(columnName:String):org.apache.spark.sql.Column
def first(columnName: String): Column
Aggregate function: returns the first value of a column in a group.
The function by default returns the first values it sees. It will return the first non-null value it sees when ignoreNulls is set to true. If all values are null, then null is returned.
CodePudding user response:
Yes, It is available in Scala Spark same as PySpark.
df.select(functions.first("col1",ignoreNulls = true),
functions.last("col2",ignoreNulls = true))
.show(false)
CodePudding user response:
Spark is developed using Scala, so any scala methods you want to use in Spark are available.