I would like to select the item that has the greatest value. For exemple in this table I would like to select MAC09
Identifiant | Val |
---|---|
MAC26 | 36 |
MAC10 | 9 |
MAC02 | 2 |
MAC32 | 11 |
MAC09 | 37 |
MAC28 | 10 |
CodePudding user response:
df.select(max("Val"), col("Identifiant")).show()
CodePudding user response:
there are several way of doing it, here is a solution using a rank
from pyspark.sql import functions as F, Window
df.withColumn("rnk", F.rank().over(Window.orderBy(F.col("Val").desc()))).where(
"rnk = 1"
).drop("rnk").show()
----------- ---
|Identifiant|Val|
----------- ---
| MAC09| 37|
----------- ---