List of Winners of Each World champions Trophy-CodePudding

Total Result of all rounds of Tournament for that player is considered as that player's Score/Result.

Schema:

 |-- game_id: string (nullable = true)
 |-- game_order: integer (nullable = true)
 |-- event: string (nullable = true)
 |-- site: string (nullable = true)
 |-- date_played: string (nullable = true)
 |-- round: double (nullable = true)
 |-- white: string (nullable = true)
 |-- black: string (nullable = true)
 |-- result: string (nullable = true)
 |-- white_elo: integer (nullable = true)
 |-- black_elo: integer (nullable = true)
 |-- white_title: string (nullable = true)
 |-- black_title: string (nullable = true)
 |-- winner: string (nullable = true)
 |-- winner_elo: integer (nullable = true)
 |-- loser: string (nullable = true)
 |-- loser_elo: integer (nullable = true)
 |-- winner_loser_elo_diff: integer (nullable = true)
 |-- eco: string (nullable = true)
 |-- date_created: string (nullable = true)
 |-- tournament_name: string (nullable = true)

Sample DaraFrame:

 -------------------- ---------- -------- ---------- ----------- ----- ---------------- ---------------- ------- --------- --------- ----------- ----------- --------- ---------- ---------------- --------- --------------------- --- -------------------- --------------- 
|             game_id|game_order|   event|      site|date_played|round|           white|           black| result|white_elo|black_elo|white_title|black_title|   winner|winner_elo|           loser|loser_elo|winner_loser_elo_diff|eco|        date_created|tournament_name|
 -------------------- ---------- -------- ---------- ----------- ----- ---------------- ---------------- ------- --------- --------- ----------- ----------- --------- ---------- ---------------- --------- --------------------- --- -------------------- --------------- 
|86e0b7f5-7b94-4ae...|         1|WCh 2021| Dubai UAE| 2021.11.26|  1.0|Nepomniachtchi,I|       Carlsen,M|1/2-1/2|     2782|     2855|       null|       null|     draw|      null|            draw|     null|                    0|C88|2022-07-22T22:33:...| WorldChamp2021|
|dc4a10ab-54cf-49d...|         2|WCh 2021| Dubai UAE| 2021.11.27|  2.0|       Carlsen,M|Nepomniachtchi,I|1/2-1/2|     2855|     2782|       null|       null|     draw|      null|            draw|     null|                    0|E06|2022-07-22T22:33:...| WorldChamp2021|
|f042ca37-8899-488...|         3|WCh 2021| Dubai UAE| 2021.11.28|  3.0|Nepomniachtchi,I|       Carlsen,M|1/2-1/2|     2782|     2855|       null|       null|     draw|      null|            draw|     null|                    0|C88|2022-07-22T22:33:...| WorldChamp2021|
|f70e4bbc-21e3-46f...|         4|WCh 2021| Dubai UAE| 2021.11.30|  4.0|       Carlsen,M|Nepomniachtchi,I|1/2-1/2|     2855|     2782|       null|       null|     draw|      null|            draw|     null|                    0|C42|2022-07-22T22:33:...| WorldChamp2021|
|c941c323-308a-4c8...|         5|WCh 2021| Dubai UAE| 2021.12.01|  5.0|Nepomniachtchi,I|       Carlsen,M|1/2-1/2|     2782|     2855|       null|       null|     draw|      null|            draw|     null|                    0|C88|2022-07-22T22:33:...| WorldChamp2021|
|58e83255-93bb-4d5...|         6|WCh 2021| Dubai UAE| 2021.12.03|  6.0|       Carlsen,M|Nepomniachtchi,I|    1-0|     2855|     2782|       null|       null|Carlsen,M|      2855|Nepomniachtchi,I|     2782|                   73|D02|2022-07-22T22:33:...| WorldChamp2021|
|29181d93-73f4-4fb...|         7|WCh 2021| Dubai UAE| 2021.12.04|  7.0|Nepomniachtchi,I|       Carlsen,M|1/2-1/2|     2782|     2855|       null|       null|     draw|      null|            draw|     null|                    0|C88|2022-07-22T22:33:...| WorldChamp2021|
|8a4ccd8c-d437-429...|         8|WCh 2021| Dubai UAE| 2021.12.05|  8.0|       Carlsen,M|Nepomniachtchi,I|    1-0|     2855|     2782|       null|       null|Carlsen,M|      2855|Nepomniachtchi,I|     2782|                   73|C43|2022-07-22T22:33:...| WorldChamp2021|
|55a122db-27d1-495...|         9|WCh 2021| Dubai UAE| 2021.12.07|  9.0|Nepomniachtchi,I|       Carlsen,M|    0-1|     2782|     2855|       null|       null|Carlsen,M|      2855|Nepomniachtchi,I|     2782|                   73|A13|2022-07-22T22:33:...| WorldChamp2021|
|1f900d18-5ea3-4f4...|        10|WCh 2021| Dubai UAE| 2021.12.08| 10.0|       Carlsen,M|Nepomniachtchi,I|1/2-1/2|     2855|     2782|       null|       null|     draw|      null|            draw|     null|                    0|C42|2022-07-22T22:33:...| WorldChamp2021|

My code looks like this. I think it's messed up. Am I supposed to do sum somewhere?

winners = df_history_info.filter(df_history_info['winner'] != "draw").groupBy("tournament_name").agg({"winner":"max"}).show()

I'm getting this result but it is incorrect in many cases.

 --------------- -------------------- 
|tournament_name|         max(winner)|
 --------------- -------------------- 
| WorldChamp2004|              Leko,P|
| WorldChamp1894|   Steinitz, William|
| WorldChamp2013|     Carlsen, Magnus|
|  FideChamp2000|       Yermolinsky,A|
| WorldChamp2007|           Svidler,P|
|  FideChamp1993|       Timman, Jan H|
|WorldChamp1910b|     Lasker, Emanuel|
| WorldChamp1921|Capablanca, Jose ...|
| WorldChamp1958|    Smyslov, Vassily|
| WorldChamp1981|  Kortschnoj, Viktor|
| WorldChamp1961|         Tal, Mihail|
| WorldChamp1978|  Kortschnoj, Viktor|
| WorldChamp1960|         Tal, Mihail|
| WorldChamp1948|    Smyslov, Vassily|
| WorldChamp1929|    Bogoljubow, Efim|
| WorldChamp1934|    Bogoljubow, Efim|
| WorldChamp1986|      Kasparov, Gary|
|   PCAChamp1995|      Kasparov, Gary|
| WorldChamp1886|Zukertort, Johann...|
| WorldChamp1907|     Lasker, Emanuel|
 --------------- --------------------

CodePudding user response：

Since the winner column contains either the winning player's name or the word "draw" which you've filtered out, then this means the operation .agg({"winner":"max"}) will return the max of a string. This is why Zukertort, Johann... appears as the winner of WorldChamp1886 instead of Steinitz..., and Yermolinksky,A appears as in the winner in the 128 person field in the FideChamp2000.

Here is an example of something you could try with a spark dataframe that looks like the following:

df = spark.createDataFrame(
    [
        ("WC1", "A"),
        ("WC1", "B"),
        ("WC1", "A"),
        ("WC1", "A"),
        ("WC1", "A"),
        ("WC1", "B"),
        ("WC1", "A"),
        ("WC1", "B"),
        ("WC2", "F"),
        ("WC2", "F"),
        ("WC2", "F"),
        ("WC2", "D"),
        ("WC2", "D"),
        ("WC2", "E"),
        ("WC2", "F"),
        ("WC2", "F"),
    ],
    ["tournament_name", "winner"]  # add your column names here
)

And you have a situation like this where you want to determine who wins each tournament by the most number of times their name appears in the winner column.

 --------------- ------ 
|tournament_name|winner|
 --------------- ------ 
|            WC1|     A|
|            WC1|     B|
|            WC1|     A|
|            WC1|     A|
|            WC1|     A|
|            WC1|     B|
|            WC1|     A|
|            WC1|     B|
|            WC2|     F|
|            WC2|     F|
|            WC2|     F|
|            WC2|     D|
|            WC2|     D|
|            WC2|     E|
|            WC2|     F|
|            WC2|     F|
 --------------- ------

You can do a groupby count on tournament_name and winner:

d = df.groupby(["tournament_name","winner"]).count()

And that gives you this pyspark dataframe:

 --------------- ------ ----- 
|tournament_name|winner|count|
 --------------- ------ ----- 
|            WC1|     B|    3|
|            WC1|     A|    5|
|            WC2|     F|    5|
|            WC2|     D|    2|
|            WC2|     E|    1|
 --------------- ------ -----

Then following this example, you could create a WindowSpec object that partitions by tournament_name, and sorts by in descending order of the count column, and apply it to d:

from pyspark.sql.window import Window
from pyspark.sql.functions import col, row_number

windowDept = Window.partitionBy("tournament_name").orderBy(col("count").desc())
d.withColumn("row",row_number().over(windowDept)) \
  .filter(col("row") == 1).drop("row") \
  .show()

Final result:

 --------------- ------ ----- 
|tournament_name|winner|count|
 --------------- ------ ----- 
|            WC1|     A|    5|
|            WC2|     F|    5|
 --------------- ------ -----