So I have this table and I am trying to sort it such that the values of the 'Sex' column are alternating. Below is the table in question
-------------------- ------ ---
| Employee_Name|salary|Sex|
-------------------- ------ ---
| Adinolfi, Wilson K| 62506| M |
|Ait Sidi, Karthik...|104437| M |
| Akinkuolie, Sarah| 64955| F|
| Alagbe,Trina| 64991| F|
| Anderson, Carol | 50825| F|
| Anderson, Linda | 57568| F|
| Andreola, Colby| 95660| F|
| Athwal, Sam| 59365| M |
| Bachiochi, Linda| 47837| F|
| Bacong, Alejandro | 50178| M |
|Baczenski, Rachael | 54670| F|
| Barbara, Thomas| 47211| M |
| Barbossa, Hector| 92328| M |
|Barone, Francesco A| 58709| M |
| Barton, Nader| 52505| M |
| Bates, Norman| 57834| M |
| Beak, Kimberly | 70131| F|
| Beatrice, Courtney | 59026| F|
| Becker, Renee|110000| F|
| Becker, Scott| 53250| M |
-------------------- ------ ---
The question posed to me is to write a statement such that :
--- -------
|sex|EMpName|
--- -------
| M |Kevin |
| F |Carol |
| M |Josh |
| F |Linda |
| M |Sam |
| F |Sam |
--- -------
Please help, any hints or concepts would be appreciated.
CodePudding user response:
you need to add an "order" column to get expected result. Here is a solution with row_number
.
from pyspark.sql import functions as F, Window
# assuming df is your dataframe
df.withColumn(
"ordering",
F.row_number().over(Window.partitionBy("sex").orderBy(F.lit(1))),
).orderBy("ordering", "sex").drop("ordering").show()