Home > other >  Display gender in alternating male female format for a table using pyspark
Display gender in alternating male female format for a table using pyspark

Time:10-28

So I have this table and I am trying to sort it such that the values of the 'Sex' column are alternating. Below is the table in question

 -------------------- ------ --- 
|       Employee_Name|salary|Sex|
 -------------------- ------ --- 
| Adinolfi, Wilson  K| 62506| M |
|Ait Sidi, Karthik...|104437| M |
|   Akinkuolie, Sarah| 64955|  F|
|        Alagbe,Trina| 64991|  F|
|    Anderson, Carol | 50825|  F|
|   Anderson, Linda  | 57568|  F|
|     Andreola, Colby| 95660|  F|
|         Athwal, Sam| 59365| M |
|    Bachiochi, Linda| 47837|  F|
|  Bacong, Alejandro | 50178| M |
|Baczenski, Rachael  | 54670|  F|
|     Barbara, Thomas| 47211| M |
|    Barbossa, Hector| 92328| M |
|Barone, Francesco  A| 58709| M |
|       Barton, Nader| 52505| M |
|       Bates, Norman| 57834| M |
|    Beak, Kimberly  | 70131|  F|
| Beatrice, Courtney | 59026|  F|
|       Becker, Renee|110000|  F|
|       Becker, Scott| 53250| M |
 -------------------- ------ --- 

The question posed to me is to write a statement such that :

 --- ------- 
|sex|EMpName|
 --- ------- 
| M |Kevin  |
| F |Carol  | 
| M |Josh   |
| F |Linda  |
| M |Sam    |
| F |Sam    |
 --- ------- 

Please help, any hints or concepts would be appreciated.

CodePudding user response:

you need to add an "order" column to get expected result. Here is a solution with row_number.

from pyspark.sql import functions as F, Window

# assuming df is your dataframe

df.withColumn(
    "ordering",
    F.row_number().over(Window.partitionBy("sex").orderBy(F.lit(1))),
).orderBy("ordering", "sex").drop("ordering").show()
  • Related