Home > Blockchain >  PySpark - Pull the row and all columns that contains the max value of specific column
PySpark - Pull the row and all columns that contains the max value of specific column

Time:11-02

I have a spark dataframe that looks like this

df =

   Name  Score Section
     W     26       A
     M     62       A
     Q     69       A
     Y     86       A
     J     16       B
     A     83       B

I want create a new dataframe that contains a single row (the row with the max score) so it will look like this

dataframe_maximum =

     Name  Score Section
      Y     86       A

I know I can use groupby and agg max to achieve this I tried something like this but I don't think I quite have it correct

 dataframe_max = df.groupBy(['Name','Score','Section']).agg(
     max('Score')

CodePudding user response:

df.sort("Score",ascending=False).take(1) Although, doing a sort is a wide operation so it might not be efficient

  • Related