Home > Back-end >  An error in groupby function in pyspark code
An error in groupby function in pyspark code

Time:08-10

I have a dataset on which I was asked to write a pyspark code for the following question.

List of Winners of Each World champions Trophy Hint: Total Result of all rounds of Tournament for that player is considered as that player's
Score/Result.
Result attributes: winner, tournament_name

I wrote this code:

game_info = spark.read.load("/content/chess/chess_wc_history_game_info.csv",
                     format="csv", sep=",", inferSchema="true", header="true")

game_info.groupBy('winner').show()

But on execution I got an error as:

AttributeError: 'GroupedData' object has no attribute 'show'

CodePudding user response:

This error is there because groupBy() contains only below mentioned functions:

  • count() - Returns the count of rows for each group.
  • mean() - Returns the mean of values for each group.
  • max() - Returns the maximum of values for each group.
  • min() - Returns the minimum of values for each group.
  • sum() - Returns the total for values for each group.
  • avg() - Returns the average for values for each group.
  • agg() - Using agg() function, we can calculate more than one aggregation at a time.
  • pivot() - This function is used to Pivot the DataFrame.

CodePudding user response:

I want to add another usefull function to @numb's list

collect_list - Collects all the values for a specific column foreach group

I guess this would help to "see" the groups

side note: truncate=False in show method print the table without truncating long text so you can actually see all the values

from pyspark.sql.functions import collect_list

game_info.groupBy('winner').agg(collect_list("<column you want to fetch>").alias('group_values')).show(truncate=False)
  • Related