Home > Back-end >  How to select the item that has the greatest value in dataframe ? In Pyspark
How to select the item that has the greatest value in dataframe ? In Pyspark

Time:10-01

I would like to select the item that has the greatest value. For exemple in this table I would like to select MAC09

Identifiant Val
MAC26 36
MAC10 9
MAC02 2
MAC32 11
MAC09 37
MAC28 10

CodePudding user response:

df.select(max("Val"), col("Identifiant")).show()

CodePudding user response:

there are several way of doing it, here is a solution using a rank

from pyspark.sql import functions as F, Window


df.withColumn("rnk", F.rank().over(Window.orderBy(F.col("Val").desc()))).where(
    "rnk = 1"
).drop("rnk").show()
 ----------- ---                                                                
|Identifiant|Val|
 ----------- --- 
|      MAC09| 37|
 ----------- --- 
  • Related