Home > Net >  How can i show highest and lowest numbers in two rows itself?
How can i show highest and lowest numbers in two rows itself?

Time:09-26

I wanted the highest number and the lowest number in two rows but I am getting the whole output , should i have to use dense rank or rank window function ?

like so
popular_eco_move=spark.sql("select a.eco,b.eco_name,count(b.eco_name) as number_of_occurance  
from chess_game as a, chess_eco_codes as b where a.eco=b.eco group by a.eco,b.eco_name order 
by 
number_of_occurance desc")
popular_eco_move.show(10)



 --- -------------------- ------------------- 
|eco|            eco_name|number_of_occurance|
 --- -------------------- ------------------- 
|C42|      Petrov Defense|                 64|
|E15|      Queen's Indian|                 56|
|C88|           Ruy Lopez|                 46|
|D37|Queen's Gambit De...|                 44|
|B90|   Sicilian, Najdorf|                 38|
|C67|           Ruy Lopez|                 37|
|B12|   Caro-Kann Defense|                 37|
|C11|              French|                 35|
|C45|         Scotch Game|                 34|
|D27|Queen's Gambit Ac...|                 32|
 --- -------------------- ------------------- 
only showing top 10 rows

Result attributes: eco, eco_name, number_of_occurences Final result will have only two rows

CodePudding user response:

Hello try a with clause to store a query a re-use it like this :

with my_select as
(select a.eco,b.eco_name,count(b.eco_name) as occurance
from `game`.`chess_game` as a, `game`.`chess_eco_codes` as b where a.eco=b.eco
group by a.eco,b.eco_name)
select * from my_select 
where occurance = (select max(occurance) from my_select) 
or occurance = (select min(occurance) from my_select)

CodePudding user response:

If you're using PySpark, you should learn how to write it in its Pythonic way instead of just SQL.

from pyspark.sql import functions as F
from pyspark.sql.window import Window as W

(df
    .withColumn('rank_asc', F.dense_rank().over(W.orderBy(F.asc('number_of_occurance'))))
    .withColumn('rank_desc', F.dense_rank().over(W.orderBy(F.desc('number_of_occurance'))))
    .where((F.col('rank_asc') == 1) | (F.col('rank_desc') == 1))
    # .drop('rank_asc', 'rank_desc') # to drop these two temp columns 
    .show()
)

 --- -------------------- ------------------- -------- --------- 
|eco|            eco_name|number_of_occurance|rank_asc|rank_desc|
 --- -------------------- ------------------- -------- --------- 
|C42|      Petrov Defense|                 64|       9|        1|
|D27|Queen's Gambit Ac...|                 32|       1|        9|
 --- -------------------- ------------------- -------- --------- 
  • Related