How to select the item that has the greatest value in dataframe ? In Pyspark-CodePudding

I would like to select the item that has the greatest value. For exemple in this table I would like to select MAC09

Identifiant	Val
MAC26	36
MAC10	9
MAC02	2
MAC32	11
MAC09	37
MAC28	10

CodePudding user response：

df.select(max("Val"), col("Identifiant")).show()

CodePudding user response：

there are several way of doing it, here is a solution using a rank

from pyspark.sql import functions as F, Window


df.withColumn("rnk", F.rank().over(Window.orderBy(F.col("Val").desc()))).where(
    "rnk = 1"
).drop("rnk").show()
 ----------- ---                                                                
|Identifiant|Val|
 ----------- --- 
|      MAC09| 37|
 ----------- ---